Building AI Training Data that Matches Complex Infographics with Text

Client A is,
One of Korea’s leading telecommunications companies, driving business innovation based on ICT convergence technologies such as AI, big data, and cloud. Recently, the company has been introducing AI company-wide to enhance operational efficiency.
Project Overview
To further improve efficiency, Client A wanted its AI systems to better understand, summarize, and analyze internal documents. As part of this effort, the company aimed to build large-scale AI training datasets. Specifically, through Crowdworks, they sought to construct a dataset of Korean-language infographics—hierarchical flowcharts, diagrams, and other graphic elements—meticulously labeled and matched with text.

Example images
The specific tasks of this project are as follows:
- Collect Korean infographic image data using publicly available datasets without licensing issues
- Collect images that have hierarchical structures and can express relationships between components
- Process and match information about each component and node with information about inter-node relationships
- Generate summary captioning that can describe the images
Why Client A Chose Crowdworks
- End-to-end data services: From collection to processing, Crowdworks covers the entire workflow.
- Quality assurance: A systematic validation system ensures high-quality data.
- Expertise: Experienced professionals can design and execute even the most complex projects.
This was also our first time undertaking such a project—was it really possible?
Client A expected data quality that fully matched their high standards. Even for Crowdworks, with its years of industry-leading expertise in data labeling and AI dataset construction, this was a challenging project. The reasons were:
- Collection of license-cleared data that meets the project’s specific conditions was required
- They wanted high-difficulty data construction work that involves defining and connecting relationships between images (complex infographics) and text
- Since Client A was also pursuing this as a new initiative with no similar experience, it was difficult to predict the project process or methods, and there were unclear standards and requirements
Project Solution Process
1) Designing the Data-Building Plan
We analyzed the requirements and sample data provided by Client A. Through this process, we identified areas where generative AI could be applied for automation. By combining our proprietary solutions with various open-source tools, we developed a customized data-building toolkit tailored to the client’s needs. The datasets were then categorized and managed by task difficulty, based on the number of nodes and connecting lines between objects.

Step 1

Step 2

Step 3
2) Deploying Qualified Experts
We determined that this project required specialists with strong expertise and a deep understanding of data context. To that end, we assigned verified data specialists with knowledge of mathematical and logical structures, algorithmic frameworks, and the ability to create or interpret such structures. In addition, they were required to understand JSON and object-based data models. Before the project began, all workers underwent comprehensive guideline training and pre-tests to minimize risks that could affect quality.
3) Ongoing Review, Feedback, and Adjustments
Throughout the four-month project, we maintained regular communication with Client A, conducting periodic reviews of sample data and flexibly adjusting the workflow to reflect interim feedback. Since this was more than a simple image-labeling task, we implemented an integrated operational strategy that combined multiple tasks such as captioning, object mapping, and text transcription. Project managers collaborated closely with internal data engineers to map out the entire workflow and establish detailed guidelines, ensuring consistency and high quality across all deliverables.
“We want to work with Crowdworks again!”
At the beginning, Client A had significant concerns: the data structures were complex, and the tasks were entirely new territory, raising questions like, “Can this really work?” As the project progressed, however, Crowdworks built trust through fast and accurate communication, thorough requirement analysis, and flexible, tailored proposals. Client A was impressed with the project manager’s direction and operational strategy, and ultimately expressed strong satisfaction with the quality of the data delivered—so much so that they concluded, “We want to work with Crowdworks again on our next project.”
As AI technology advances and companies build increasingly diverse services, the need for high-quality, complex datasets continues to grow. If you ever encounter a project that seems almost impossible, Crowdworks is here to help. For us, it’s not just a challenge, but also an opportunity to provide the most efficient and effective solutions together with our clients.