Building the NIA Medical Knowledge Q&A Dataset

Client NIA is,
A public institution under the Ministry of Science and ICT, leads the planning and execution of national AI data projects, playing a pivotal role in building a high-quality AI training data ecosystem.
Project Overview
To promote the spread of the AI data ecosystem, NIA launched the “Ultra-Large AI Expansion Ecosystem Project,” a large-scale national initiative carried out through consortia of specialized companies and institutions. One sub-project focused on creating a 30,000-pair medical knowledge Q&A dataset to support advanced NLP tasks in healthcare.
- 15,000 Q&A pairs in specialized medical knowledge: Causes, diagnosis, treatment, management, prevention, and latest findings on diseases.
- 15,000 Q&A pairs in essential medical knowledge: Obstetrics & gynecology, pediatrics/adolescent medicine, emergency care, etc.
The challenge was that raw data from other consortium members had to be rigorously reviewed and corrected to ensure accuracy and consistency with actual medical knowledge. Success required not only access to highly qualified medical experts but also proprietary solutions capable of supporting such a demanding verification process.
Why NIA Chose Crowdworks
- Verified medical expert pool: Doctors, nurses, and medical trainees with professional credentials and evaluation-tested qualifications.
- Tailored solutions for Q&A refinement, validation, and correction: Proprietary quality control tools optimized for large-scale data validation.
- Proven track record with NIA: Five consecutive years of successful participation in national AI data projects.
- Reliable project management: Extensive experience in handling large-scale projects with systematic and agile operations.
Project Solution Process
1) Selecting the Right Experts with Our Medical Badge System
Crowdworks leveraged its proprietary Medical Expert Badge System to quickly secure qualified personnel. The badges cover six tiers (Doctor Levels 1–4, Nurse, Medical Trainee), awarded only to those with verified credentials and who passed specialized tests. This ensured trustworthy work quality.
2) A Three-Step Validation Process
- Medical Knowledge Validation: Raw AI-generated data unrelated to actual medical knowledge was systematically flagged as [Not Usable].
- Domain Classification: Verified data was categorized by specialty (e.g., obstetrics, pediatrics).
- Q&A Review and Correction: Each question and answer was individually validated, with the interface optimized to allow direct corrections for errors or missing details

Example of a work interface
3) Real-Time Training and Rigorous Quality Control
Crowdworks provided detailed guidelines and live online training to ensure worker comprehension. Over the 3-month project, sample data and progress reports were regularly shared with consortium partners, with feedback promptly incorporated. Leveraging its proprietary validation system, Crowdworks successfully delivered 30,000 high-quality Q&A pairs that met all project requirements.

Sample Q&A Data
The Key to Success : Crowdworks’ Expertise and Extensive Government Project Experience
The project PM stated, “The key to this project was selecting professional experts rather than general workers and completing the tasks within the set timeframe.” Due to the sequential nature of NIA projects—covering stages such as data collection, refinement, and processing—flexible schedule management was essential for building large-scale datasets in a short period of time. Leveraging Crowdworks’ pool of verified experts and its operational know-how accumulated through years of government project execution, the team successfully completed the project on time with a thorough quality assurance framework and agile schedule management.
The completed dataset has been made publicly available through AI Hub, and will be widely used in developing various medical AI services, such as chatbots capable of providing reliable, clinically grounded answers based on verified medical knowledge.
As AI technology penetrates deeper into specialized fields such as healthcare, law, and finance, the value of high-quality data verified by experts is becoming ever more critical. If you are planning a data project that demands specialized expertise and trusted quality, Crowdworks is ready to help.