Why Companies Good at Data Are Sure to Rock with LLM?
Editor’s Note
On April 17th, ‘ACC(Advanced Computing Conference)+ 2024’ was held at COEX in Seoul. During this event, Crowdworks AI’s CTO, Hyungjoo Lee, led a session to share insights on the importance of data in the process of adopting LLM within enterprises. The session hall attracted over 200 attendees, highlighting the intense interest that enterprises have in AI adoption. Below is a summary of his presentation, reorganized to share insights with our clients.
▲ Crowdworks CTO Lee Hyungjoo
“Does Crowdworks AI offer LLM service?”
When we first introduce Crowdworks AI to clients, a common question we get is: “I know you’re a data specialist, but do you also specialize in LLM service?” What does it mean for clients to specialize in LLM service? It speaks to the ability to effectively integrate LLM into business tasks. When you think of an enterprise with LLM expertise, certain words come to mind:
‘Top ranking on the leaderboard’, ‘Own SLM provider’, ‘publishing AI-related papers’, ‘possessing solutions applicable to all clients’, ‘preparing a universal LLM evaluation dataset’, ‘presence of doctoral-level personnel’∙∙∙
Of course, these are critical points. Without these efforts, we wouldn’t see the advancements in LLM technology and market growth we have today. But, are these truly the most critical points for enterprise LLM?
The role/characteristics of enterprise LLM | The role/characteristics of general AI model |
---|---|
– It enhances the efficiency of tasks previously done by humans – Answers vary based on client’s workflows – Answers may not always align with quality – Unexpected or better results may occur | – It performs tasks difficult or impossible for humans – It provides consistent answers for identical problems and requirements – Quality is directly explained by adherence to the correct standards – Unexpected answers are incorrect |
So, the key to developing an enterprise LLM is understanding the client’s tasks. Without a thorough understanding of the client’s tasks, it is challenging to develop an effective and essential LLM. Then, how can we understand the clients’ tasks? Usually, enterprise tasks are all about the ‘data’ and how it ‘flows’ (as shown in the image below).
After all, understanding the client’s tasks means getting a handle on the content, structure, and context of such data.
The performance of LLMs is determined by data. While foundation models are well-trained on fundamental data and demonstrate outstanding performance, enterprise-level LLMs shall effectively leverage internal data.
This is why we can do better than others when it comes to developing an enterprise LLM. We do understand the client’s tasks(data), identify deficiencies, and know how to interpret and prepare LLM-ready data effectively thanks to our 7-years-old expertise.
Enterprises’ internal Data for LLM, the reality is…
So, let’s take a look at some internal data-related cases in the development of LLM for enterprises.
[Use Case 1] From “Lots of Data” to “No Data”
There was an enterprise that requested the development of LLM, claiming to have a substantial amount of internal data ranging from 200,000 to even 1 million sets. They said we could access all their documents and build a chatbot based on that data. But when we checked out their sample data, it was chaos – NO rules, NO standards. Document titles, authorship—nothing was organized. We couldn’t even categorize the data, and the client said, “Can’t RAG figure it all out for us? We don’t want to organize it; we just want the chatbot.” (True story) As many may know, in such cases, it’s impossible to move forward with the project. A bundle of files nobody understands isn’t data. Conducting RAG requires understanding metadata. Without knowledge of the data the enterprise possesses, developing an efficient system is out of the question. When starting a project, we spend a lot of time in initial consultations, defining requirements, analyzing documents, and preparing metadata. That’s how we can truly develop an LLM that’s practical for enterprises. Unfortunately, the client wasn’t ready to put in the effort to get their data in shape, so the project was halted.
[Use Case 2] From “All Data” to “Half Data”
Some enterprises got ready for LLM adoption by internally preparing datasets, with some even having over 8,000 datasets ready. However, we were only able to use half of the received data. Take this one case: The enterprise wanted an LLM-powered chatbot for copywriting and content ideas. They said they had the data we needed, but all they had were copies of old AD campaigns. To fine-tune the LLM properly, we need lots of metadata – like target, style, tone, and such. But they just handed us the finished copies without metadata. So we proceeded with the project by consulting with them, identifying missing elements in the data, and collecting additional metadata. In such a case, the role of Crowdworks AI is to ‘complete the data’.
[Use Case 3] From “No Data” to “Lots of Data”
We had another client looking to create an AI translation app for the construction and shipbuilding industry, which is filled with specialized jargon. Existing translation apps often mess up these jargon, making them useless. At first, the client said they didn’t have any data and just handed us a book of the glossary of jargon. After going through it, though, we found it was well-organized and had enough info to get started, leading to project approval. Of course, we also spent a lot of time talking to real users to ensure the app meets their needs and performs effectively. In this case, the role of Crowdworks AI was to ‘uncover the significant value in small datasets’.
What to do before chasing the latest technology
With AI-related technologies evolving rapidly, a massive amount of articles and information is released daily. The constant influx of specialized terminology can be overwhelming and difficult to keep up with.
However, if enterprises focus solely on the technology itself and try to chase every new development, they risk losing their way. It’s more important for enterprises to find the optimal technology that helps achieve their goals rather than simply knowing the latest trends.
As an expert in adopting LLM for enterprises, I recommend starting by evaluating your internal data rather than blindly pursuing the latest technologies. Understanding what data you have and how it can be utilized should come first to achieve your objectives. If you find this challenging, Crowdworks AI is always here to assist you.