Building Training Data for Maritime Autonomous Vehicles
What difficulties did the client face?
For any autonomous vehicle to effectively navigate the ocean, the model driving the navigation system must be trained with a sufficiently large volume of diverse data. Our client, like many autonomous driving model developers, was also struggling to secure large-scale, high-quality labeled data. In particular, the client who was figuring out the most efficient and effective way to annotate the data spent significant resources on developing a solid labeling guide that explains to the labelers of varying skills and experience how to perform the necessary annotations. In the end, the client concluded that to deliver the project on time, she needed the help of an experienced and professional data annotation service provider, and reached out to Crowdworks!
Client’s requirements
- Filter the appropriate data from the open-source ship-related images provided
- Review and correct the annotations on the images according to the guide provided by the client
- Deliver around ninety-thousand images within four weeks
Why Crowdworks
- Prompt and effective communication from the team
- Proven track record successfully delivering projects for diverse industries including leading autonomous driving companies
- On-time delivery leveraging crowdsourcing
- Data quality guaranteed by the proven quality control system
Crowdworks’s Solution
Preprocessing and Annotating the Data
The very first task we assigned to the group of labelers selected specifically for this project was to filter inappropriate data from the open-source images of ships collected by the client. Inappropriate images included illustrations and not the actual images of ships as well as a picture of a ship sitting on a desk. These images could not be used to train the model and must be filtered in advance to prevent unnecessary work.
Moreover, the source images provided by the client were already annotated with around twenty categories. The labelers were hence asked to first remove unnecessary categories, apply bounding boxes to specific objects as written in the labeling guide, and then re-annotate the thing with the eight newly defined categories.
Meanwhile, especially since this was a tightly scheduled project, our skilled project manager carefully monitored the entire progress and ensured that any issue was escalated and resolved promptly.
Building the most optimal labeler pool with the AI-assisted worker-task matching technology
To successfully deliver the project within a demanding timeline, we knew we had no choice but to build a dream team comprising labelers with related experiences and domain expertise. Labeling work done by an inexperienced labeler will inevitably have a higher rejection rate during the review and therefore will delay the project timeline.
Yes, it was the perfect time to use our AI-assisted worker-task matching technology! This technology is built with our data on 0.5M labelers. We’ve analyzed the labelers’ profile data such as sex, age, work experience as well as his/her performances in data labeling projects to identify labelers’ characteristics and preferences. Our technology uses this multifaceted big data on labelers and the project context data such as required annotation type to select the most optimized group of labelers for any project.
The result was a huge success! We successfully delivered over ninety thousand high-quality labeled images to our clients in a month!
Comments from the Project Manager
“Client’s requests were indeed very challenging. The timeline was tight, but the volume of data wasn’t small. The project required quick decision-making from my side and quality labeling from the labelers. I have to admit if it weren’t for our skilled labelers and the reviewers (who inspect the labelers’ work), I wouldn’t have been able to deliver the project on time. A special shoutout goes to our PI (Project Improvement) team who secured the right group of labelers for our project.”