Mission

AceInterview is an AI practice interview platform that analyzes and provides feedback on user posture, speaking tone, eye contact, and other key interview behaviors.

An accessible, easy-to-use platform for college students and new grads searching for behavioral interview coaching.

Technology

To provide the most comprehensive interview feedback, several artificial intelligence models extract visual, verbal, and emotional performance features from user videos. These features are combined together within the AceInterview large language model to generate a holistic analysis of each user's interview behaviors.

1. Visual Features

Good posture and body language are vital toward job interview success.

MediaPipe computer vision models are used to identify body landmarks such as the eyes and hands. These landmarks are then geometrically postprocessed in order to extract features regarding the user's posture and hand motions. Modified Haar Cascades are then used to detect how often users are seen smiling.

Each model was unit-tested with numerous different camera angles and subject placements, to demonstrate accuracy and generalizability.

2. Verbal Features

Eloquent and articulate speech delivery is a key factor in job interviews.

The CrisperWhisper model helps transcribe every word from user videos, including filler words, pauses, stutters, and false starts, which contribute towards an understanding of user vocal confidence and presentation.

The Parselmouth library is used to extract Prosody features, such as EngagingTone, Excited, and Jitter. These features are all generally correlated with how likely an interview is to recommend hiring a candidate.

3. Emotional Features

An interviewee's tone and expression are another dimension to be considered.

Hume's state-of-the-art emotive AI can predict 48 distinct vectors of emotion. The expression measurement models are used to predict candidate vocal and facial expressions throughout the submitted video.

These expressions range from Joy and Enthusiasm to Anxiety and Doubt and beyond, helping to provide a more qualitative analysis of each user's emotional projection.

4. Feedback Generation

Once all the features are calculated, they are combined within a large language model to generate detailed feedback.

A Google Gemini model is prompted with the given features to produce a Markdown-formatted output with personalized, descriptive analysis of each candidate's interview performance.

When human evaluators and our model were asked to score interview performance in multiple categories on a scale from 1-7 (such as Authentic, RecommenHiring, etc.), their scores differed by less than 1 point on average.