Minyoung Lee - Vishing with acoustic feature

Voice Phishing Detection with Acoustic Voice Feature

Experience in Anomaly Detection Using Voice Data
- Developed a model to detect phishing based solely on the acoustic features of voice using a real voice phishing dataset.
- Demonstrated high accuracy in anomaly detection performance even in real-time conversational situations by applying various acoustic feature extraction methods (STFT, MEL, MFCC).
Providing a Testing Environment for Interactive Settings
- Built an interactive dataset for detecting voice phishing occurring in real conversational environments, mixing phishing and normal conversation data to create the experimental setup.
- Proposed and evaluated an effective phishing detection model in interactive scenarios.

Introduction

Voice phishing is a serious social issue that can lead to financial losses, necessitating the development of technology to detect it in real-time conversational situations.
Existing studies have primarily focused on detection based on recorded voice or text conversion, which has limitations for real-time detection.
New datasets and models need to be developed that enable real-time voice phishing detection using only acoustic features in actual conversational contexts.

Develop a voice phishing detection model based on acoustic features using a real conversational voice phishing dataset.
Build and evaluate a model that can detect phishing with high accuracy in real-time conversational environments.

Built an interactive dataset reflecting the acoustic characteristics of phishing criminals using actual voice phishing case data from the Financial Supervisory Service of Korea.
Collected a total of 723 actual voice phishing conversation samples, which were divided into 36,905 utterance units for analysis and model training.

STFT: Extracted time-frequency information by applying the Short-Time Fourier Transform to the voice data at short intervals.
MEL: Extracted acoustic features based on the spectrum that can be perceived by humans.
MFCC: Analyzed acoustic features by extracting important coefficients in the frequency domain of the voice signal.

Evaluated voice phishing detection performance using basic ML models such as SVM, Logistic Regression, Decision Tree, and Random Forest, as well as DL models like DenseNet and LSTM.
Adopted a concise model architecture for real-time detection to ensure fast training and evaluation times.

Split the data into intervals ranging from 0.5 seconds to 2.0 seconds to evaluate whether each model can detect phishing within various time frames.
Mixed voice phishing and normal conversation data to experiment with detection performance in conversational environments.

Recorded high detection accuracy (over 97%) across all acoustic features (STFT, MEL, MFCC).
Notably, ML models (SVM, LR) using MFCC features demonstrated high detection performance even with short data under one second, confirming the potential for early phishing detection.
Maintained high accuracy in experiments using the conversational dataset, proving effective phishing detection in real-time conversation situations.

Confirmed that high detection accuracy is achievable based solely on the acoustic characteristics of voice phishing criminals.
Built and evaluated datasets and models that enable effective detection in real-time conversational environments.
Proposed a model structure that integrates various acoustic features to enhance phishing detection accuracy, setting the direction for future research.

We need to secure more real phishing data to improve detection performance across various speaking styles and situations.
Research is needed to enhance the model's versatility for use in general conversational contexts, not just voice phishing scenarios.
Future development may involve creating a system integrating warning and response functionalities and phishing detection in real-time conversational systems.

Page updated

Google Sites

Report abuse