HR Agent
posted in 2025
posted in 2025
(I) Objective: The goal of the HR Agent is to automatically answer employee questions such as: Salary payment dates, How to apply for leave, Medical and insurance information, Workplace bullying or sexual harassment procedures.
By automating responses to frequently asked questions, the HR Agent significantly reduces the repetitive workload of HR staff and improves response efficiency.
(II) Collect data: We collected internal company documents and curated question–answer pairs for system evaluation. The dataset includes policies, guidelines, employee handbooks, workflow documents, and historical inquiries.
(III) HR Agent pipeline:
Data Cleaning and Normalization:
Text Extraction: Use OCR for PDFs/images.
Denoising: Remove boilerplate (headers, footers, company mark), duplicates.
Normalization: Standardize encoding(utf-8), fix garbled characters, normalize punctuation.
Chunking / Segmentation:
Chunking Methods: First, Semantic splitting (by paragraph/section). Then, if any segment exceeds max_tokens, split it using a fixed length.
Overlap: Typically 20–50 tokens overlap to avoid cutting off important context.
Metadata: Each chunk carries source ID, page number, paragraph number, language, timestamp.
Text Embedding:
Embedding Model: Qwen3-Embedding-8B for Chinese documents.
Normalize Embeddings: Apply L2 normalization.
Generate Embeddings in Batches.
Vector Store / Index:
Vector Database: Qdrant.
Index Type: HNSW (Hierarchical Navigable Small World).
Retriever: Multi-Stage Retrieval:
Stage 1: Fast Approximate Nearest Neighbor (ANN) fetch top-N (large N).
Stage 2: Reranker (cross-encoder or lexical rerank) reduces to top-k.
Context Assembly / Prompt Building:
Concatenate top-k chunks (direct concatenation).
Summarize + feed (summarize top-k chunks first, then feed summary to generator).
Generator:
Model: Qwen3-14b, gpt-oss-20b.
RAG-Sequence: Retrieval top-k chunks → Concatenate → Generate (single context).
RAG-FiD(Fusion in Decoder): Retrieve top-k chunks → Independently encode each chunk → The decoder performs information fusion.
Decoding Settings: temperature, top_p, max_tokens, repetition_penalty.
Constraint Techniques:
Use system prompts to control style and format.
Constrained decoding: outlines.
Grounding & Faithfulness Checking: To avoid Hallucination
Answer Verifier: Use deberta-v3-base-zeroshot-v1.1-all-33 model to judge if the generated answer is consistent with the retrieved evidence.
Groundedness Score: Use paraphrase-multilingual-MiniLM-L12-v2 to calculate semantic similarity between the answer and chunks, or use zh_core_web_sm directly check if key assertions appear in the sources.
If the answer is not trustworthy, return a fallback instead of guessing.
Evaluation:
Retrieval Quality: Recall@k, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG), Context Relevance.
Answer Accuracy:
Lexical: EM, F1, ROUGE-1/2/L
Semantic: BERTScore, Answer Similarity
Faithfulness: Hallucination Rate, Groundedness Score.
Answer Relevance:Semantic Relevance, On-topic Score.
(IV) Results: The diagram below shows how the HR Agent answers the question: “What are the company’s overtime rules?”