Systems & Software

  • 문자 광학 인식 기술 (Optical Character Recognition) (Developed 2020.07)

    • Deep Learning based Korean/English OCR Engine (KOR: 95%, ENG: 95%, General Document: 90%)


  • 한국어 음성인식 엔진 (Korean Speech Recognition Engine) (Developed 2020.01, revised 2021.02)

    • Deep Learning based Korean Speech Recognition Engine (WER: 19%, CER: 9%)

    • Accuracy Enhancement (WER: 17.1%, CER: 6.5%)

    • Trained with 1000+ Hours data


  • Commercial Crawling System (Developed 2018~ongoing)

    • Scheduling multiple crawlers & Managing related processes (Admin)

    • Apache solr based Search (User-side)

    • Crawling 200+ websites concurrently

    • Easy management of more than 40+ machines and 200+ scrapy crawling nodes

    • Integration with NLP-based document management system


  • NSCC(National Science & Technology Code Classifier)

    • (한국) 과학기술표준분류기반 문서자동 분류기 (중분류 200개기준, Top1: 75%, Top2: 85%) [2020]

    • (한국) 과학기술표준분류기반 문서자동 분류기 (중분류 145개기준, Top1: 78%, Top3: 90%) [2021.11]

      • KorSci-Electra모델 개발 & Fine-tuning