Digitizing International Trader Qualification Exam with OCR Technology
What difficulties did the client face?
In order to stay ahead of the digital curve, our client who provides the International Trade Professional Qualification Examinations was willing to start offering the exam online. The client was providing onsite examinations once or twice a year, but by going digital, he/she could increase this number by 12x to once a month! The biggest obstacle in this plan, however, was building the Q&A data in a machine-readable format. In other words, the client had to build a digital question bank that stores all questions and answers presented in the paper exams. Our client struggled to transcribe all the information into PDF format with a machine-readable format, and thus reached out to Crowdworks for help!
Clients’ requirements
- Provide interim reviews to ensure the quality of data
- Deliver the final output in a Google Spreadsheet
- Offer high-quality OCR(Optical Character Recognition) technologies
Why Crowdworks!
- Proven track record of successfully delivering projects from gathering and delivering upon requirements
- Effective and efficient communication with the project team including timely reviews
- Meticulous quality assurance system
- Thoroughly annotated and inspected data
Crowdworks Solution
Preprocessing and Annotating the Data
In order to increase the number of exam distributions, the exam makers had to develop more exams in a shorter period of time. The key lies in the comprehensive digital question bank. The exam makers required easy access to the entire pool of questions so that they could efficiently select the questions to be presented on the next round of any given exam. Without a comprehensive view, the exam maker had difficulties analyzing the overall level of the exam or simply lacked an adequate volume of questions to ensure diversity.
Crowdworks recognized this need and designed a tailored project that not only transcribes all questions into the PDF documents but also labels (i.e. metadata) each question data for effective data management as well as model training. We leveraged our experience providing text transcription to a wide spectrum of clients to meet the proposal. The client precisely reviewed the business value we could provide, and finally approved our proposal!
Since this was the client’s very first data annotation project, he/she requested a detailed consultation including defining requirements, developing a detailed annotation guide, designing the annotation work screen, and more. We worked with the client’s project team to thoroughly and accurately understand their business requirements and developed a tailored Google Spreadsheet to utilize as the client’s question bank. The Spreadsheet was designed to enable exam makers to quickly and easily search for specific problem types.
Transcribing the questions using an unique method tailored to the client
The International Trade Qualification Exam is known for its difficulty and comprehensive coverage, including fifty questions each from four subjects. This also meant that our labelers had to work with specialized types of data including international purchase orders and invoice tables. Our project team quickly realized that developing a comprehensive and detailed annotation guide was critical to successfully run the project.
Hence, our team closely worked with the client’s project team to discuss how to address different types of data and provide recommendations on establishing the classification for the questions. Upon close collaboration, we managed to draft a tailored guide that included solutions for edge cases! For example, if the question included an image that couldn’t be transcribed into text, the guide instructed the labeler to include a screenshot of the image instead.
Comments from the Project Manager
“This project made me realize the full strength and potential of our platform to enable our labelers to work freely anytime, from anywhere! For example, opening a project that allows multiple labelers to simultaneously perform annotations on one PDF document was a simple task with our platform. Such project design lessened the overall burdens of the work as a labeler could focus on one specific annotation task and not multiple annotation tasks on one PDF document. The entire project team including myself felt immensely proud to see our client’s concerns disappear one by one as the project progressed!”