🌕 Gate Square · Mid-Autumn Creator Incentive Program is Now Live!
Share your creations with trending topics and get a chance to split $5,000 in rewards! 🎁
👉 Join now: https://www.gate.com/campaigns/1953
💡 How to Join:
1️⃣ Post with the hashtag #Gate Square Mid Autumn Creator Incentive# .
2️⃣ Your content should follow the daily trending topics posted by [Gate _Square], and include both hashtags.
3️⃣ The more posts, higher quality, and greater engagement — the bigger your rewards! 🚀
💰 Creator Rewards:
🏆 Top 1: Bulgari Mid-Autumn Gift Box + $100 Futures Voucher + $100 GT
🥈 Top 2: Bulgari
Interns under the big model craze: 985 per capita? But "labeling" in a big factory
Source: Jiazi Guangnian
Author: Zhu Yue
On the first day of the official internship, Chen Xi felt that she might have been "cheated".
Chen Xi, a prospective graduate student who had just finished her school graduation thesis, was ready to find something to do for herself. After submitting several resumes, she was soon invited to the artificial intelligence editing position (translation direction) of a major domestic Internet company.
The job description reads:
For Chen Xi, who didn’t know much about model training, this seemed like a pretty good internship.
Chenxi’s interview was in the field of translation, which is very consistent with her English major. At the beginning of the year, ChatGPT became popular in China, and Chenxi has the habit of using AI products every day, which is in line with her interests. In addition, she will have the opportunity to participate in emerging technologies. Industrial development is rare for liberal arts students; of course, the biggest attraction comes from this major Internet company. In the past few years, the company has successfully attracted countless young students. From a certain perspective, the name of a major company on a resume is enough to become a symbol of one's own abilities.
However, apart from the simple job description on the recruitment page, Chen Xi did not get any more information about this internship from the interviewer.
"The reason I felt cheated at the time was because HR basically focused on translation-related issues during the interview." After answering several translation questions during the interview, Chen Xi successfully got the offer. Until she started working, she thought it was a job as a translator.
Chen Xi is not the only one who feels "cheated".
As one of the earliest interns of artificial intelligence editors, Yang Xiaoyun also came to this large factory at the end of February. The interviewer said that this is a job that requires high abilities such as information capture, language summarization and text editing.
After actually getting started, she realized: "The work described by HR and the actual work are completely two different things. No matter how glamorous it is, it is actually a 'marking' job."
Nowadays, the artificial intelligence craze has led to chatbots that look like real people and drawing software that can generate pictures with simple prompts. The phenomenon-level emergence of large models has attracted much attention as the basis for training large models. Data, algorithms, and computing power have attracted much attention. Data annotation is an indispensable part of the data link.
In 2007, Li Feifei, then an assistant professor in the Department of Computer Science at Princeton University, started a project called ImageNet, hoping to expand the data that can be used to train AI algorithms.
In order to provide as many visual examples as possible for each word, nearly 50,000 workers on Amazon's crowdsourcing platform Mechanical Turk spent two and a half years labeling objects in the pictures, such as balloons, strawberries, etc., totaling 3.2 million pictures. picture. These workers come from 167 countries around the world and are mostly located in areas with low labor costs.
An investigation by Time magazine found that to reduce violence, sexism, and racism in the ChatGPT data set, OpenAI used Kenyan laborers who earned less than $2 an hour. "Bloomberg" reported that Google's AI chatbot Bard was trained by thousands of contract workers, who only had 3 minutes to review and annotate Bard's answers.
For a long time in the past, data annotation did not require high cognitive requirements for language and image recognition. In the era of large models, data annotation has shifted from images to language, which is more demanding and vertical, requiring professional knowledge in specific fields and fluent language skills.
But for ordinary data annotators, it is still a low-tech job that is constantly repeated.
Just like these "cheated" students from prestigious schools entering large factories for internships, they cannot explain clearly what purpose their work is for and what its value is. They often only have a vague understanding of it in order to "train large models".
Artificial intelligence editing interns represented by Chen Xi and Yang Xiaoyun were born out of the need to train large models. These popular large-scale models allow interns to enter with curiosity and longing. At the same time, they also feel the real chaos and lack of value behind them.
1. When college students flood into big model data annotation
According to the "National Vocational Skills Standards for Artificial Intelligence Trainers" released by the Ministry of Human Resources and Social Security in 2021, the common education level of artificial intelligence trainers is junior high school graduation (or equivalent education). They may be scattered in areas where traditional labor-intensive enterprises are located, such as Hebei, Henan, Shandong, and Shanxi, or even in more remote mountainous areas where data annotation is a pilot project for poverty alleviation.
**But changes have already occurred with the emergence of large models. **
What makes Yang Xiaoyun bored is actually the data annotation work done to train large models.
After simple training and assessment, Yang Xiaoyun was assigned to the copy editing team. **Her daily job is to answer questions in the question bank. The purpose is to optimize the training process of large models by manually writing answers by annotators. **
**The steps to answer a question are strictly controlled. **Take the game "Genshin Impact" as an example. If you receive the question "What is Yelan's sacred relic?", Yang Xiaoyun needs to split the answer into several paragraphs: First, what is Yelan? Secondly, what are the holy relics? What does Yelan's holy relic match in the end?
Collect information on the designated search engine, complete the editing of the answer, and finally submit it in the form of Markdown.
In addition to simple and easy-to-answer questions, Yang Xiaoyun spent most of his time in professional areas that he was completely unfamiliar with, such as the economic zone, legal zone, etc.
Obviously, this is completely different from previous data annotation work. **
Before the emergence of large models, data annotation scenarios were usually factories with hundreds of people, each person had a computer, and there was only the sound of a mouse and keyboard. And during their 8-hour working day, they only do one simple and repetitive thing: frame motor vehicles, non-motor vehicles, pedestrians, and traffic lights (target detection) in different pictures; or underline the subject of a paragraph, Predicate, object (semantic segmentation).
These drawing boxes for pictures and videos and semantic segmentation of text are all processing of existing data, and the data annotators themselves do not need to give "creative conclusions." But this is not the case for data annotation of large models. In addition to processing existing data, data annotators also need to answer questions and give correct conclusions. **
According to the "In-depth Analysis and Investment Trend Research Report on the Current Situation of China's Data Annotation Industry (2023-2030)" released by Guanyan Tianxia Data Center in 2023, before the release of ChatGPT, AI training data annotation was mainly based on voice and computer vision, and natural language The demand for processing (NLP) is less than 15%.
As the ChatGPT chatbot becomes a phenomenal application of AIGC, there is an increasing demand for high-quality text annotation tasks such as emotional judgment, understanding ability and even reasoning ability.
“The complexity of (large model) projects has become higher than before, and the requirements for personnel are relatively different.” The person in charge of the Stardust Data Product Department told “Jiazi Light Year”, “Recognition and annotation of partial visual information for autonomous driving , which is more physical work, requires some training for employees. After they learn how to pull boxes, master shortcut keys, and master some skills, they can quickly become competent. ** But what large models need is a complete and structured , a diversified and all-encompassing data system that requires four layers of data to support the construction and improvement of the model. These data involve pre-training, SFT (Supervised Fine-Tuning), RLHF (Reinforcement Learning based on human feedback, Reinforcement Learning from Human Feedback), privatized deployment, etc. In response to the needs of different industries, we have released the COSMO large model data pyramid solution; for large model data annotators, labeling COSMO data is not a multiple-choice question, nor is it simple Instead of reading comprehension and text editing, it allows you to create questions and answers and create content.**”
Jia Yuhang, general manager of cloud measurement data, divides the training data of the largest model into three stages: basic data, scene data and scene data optimization. **He analogized these three stages to the process of learning.
“Basic data annotation such as box-drawing is relatively simple, and you can master computer operation as soon as you learn it; scene data is data in a specific field needed for targeted research and development in specific links, and you need to learn relevant domain knowledge to achieve annotation. requirements; by the third stage, based on continuous iteration and optimization during put into use, the requirements for skills and domain knowledge will be more refined." Jia Yuhang said.
Under this kind of work demand, more and more large model companies have a demand for data annotators, which has also changed from low-educated to high-educated in the past, and this demand is increasing.
On the domestic mainstream job search platforms, many data annotation positions for large models are currently being recruited. These positions require annotators to have a bachelor's degree or above. Baidu has previously stated that its large model data annotation base in Haikou has hundreds of data annotators, and the undergraduate rate has reached 100%.
2. Harsh large model data annotation
Behind these repetitive tasks is actually to realize the technology of "Reinforcement Learning from Human Feedback". The biggest improvement of GPT-3.5 comes from this. The key is human (Labeler). participation, that is, these data annotators.
From the above three steps of RLHF, step one and step two are relatively more important, because they determine the level of data quality necessary to train the reward model. The data annotation interns in these two steps are also divided into two core groups: "editing group" and "sorting group". **
The job of the editing team is to answer the questions in the question bank; while the job of the sorting team is to rank the generated answers (including model and artificially generated answers).
Ding Xiaoyu joined the copy editor in July. Ding Xiaoyu, who is also an English major, and Chen Xi, are looking forward to a translation job that can improve their professional level, but her job is actually not related to English.
Compared with when Yang Xiaoyun was interning in February, the copy editing team faced by Ding Xiaoyu has become more subdivided. Each intern has to choose a vertical direction, such as entertainment, physics, politics, etc., and the answer requirements have become more detailed.
For a multiple-choice question on ancient poetry, you should not just explain the answer, but first introduce the question type, then the translation and background of the poetry, and finally an analysis of whether each option is correct or not. The most important thing is to benchmark against the March GPT-4 released by OpenAI on the 14th.
"You have to refer to its answer, but it cannot be the same as its answer, and it must be better than its answer." Ding Xiaoyu was helpless.
Chenxi was assigned to the sorting group, where multiple answers to questions were sorted every day to determine the pros and cons of different answers.
The results of ranking need to be clearly quantified. She needs to rate the answers from different perspectives such as usefulness, authenticity, relevance, safety, etc. and write down the reasons. This is to allow machines to get infinitely closer to the answers humans expect.
**Chen Xi found herself sometimes having to choose between several bad answers. And when all the answers were bad, she was asked to write a better answer herself. **
Ding Xiaoyu of the editorial team faces even more demanding requirements. Each answer will face two rounds of review before being qualified for delivery. The first one came from the team leader: "After completing a few questions, we will have a review meeting to find faults with us until the team leader is satisfied with the changes." The second one came from the headquarters, and it is not over until the headquarters has passed the review.
Once, due to formatting errors, most of Ding Xiaoyu's answers were judged to be completely wrong. "It might be enough to adjust the order, but they don't care whether the content of your answer is wrong or there is a problem with the format. It's just that it's all wrong."
What made Ding Xiaoyu even more devastated was that the team leader directly stated that if she made so many mistakes again, she might be dismissed.
**Data annotation for large models is an absolutely result-oriented job. No matter how much effort you put into the process, as long as the results are not good, all previous efforts will be completely negated. **
But the problem is that whether it is the answer output of the editing group or the sorting of answers by the sorting group, it is a very subjective task. It is difficult for data annotation interns to control whether an answer is good or bad; different interns often give different answers to the same question.
**In order to solve this problem, one of the tasks that the large model data annotation team must perform every day is to hold an "audit meeting" - known as the "Racing Meeting" within the company. The purpose is to align the answer standards and align the answers. Everyone understands and all suggestions are aligned. **
However, it is quite difficult to achieve true alignment. This is just like the college entrance examination grading. Different people will be assigned the same questions. If the scores are inconsistent, they must be continuously adjusted until a unified score is obtained.
In Chen Xi’s impression, two or three hours are spent in meetings every day. By the end of the meeting, the simplest and crudest solution is often finalized, with the minority obeying the majority. She described it as "creating value without value."
However, compared to everyone sitting together to "artificially" align the answer standards, a more troublesome problem is: the ** standards are not artificially aligned once and for all, but must be constantly adjusted based on feedback from the model output. **
The first thing when he goes to work every day, Yang Xiaoyun needs to confirm whether a new annotation standard has been issued that day, ranging from the frame of the answer, the splitting of paragraphs, to the selection of search engines, formats such as spaces and punctuation marks. But ** standards are constantly changing. **Once it is found that the fed data does not work on the machine, the standards need to be re-formulated, and all problems will be overturned and rewritten.
"It's like weaving. Should we weave horizontal or vertical grains? Should we weave sesame buttons or wheat buttons? But no matter what button it is, it can only be put into the program and run. If you find that it can't run out, you have to change a method. "Yang Xiaoyun explained to "Jiazi Guangnian". Behind this metaphor is that if the answer given by the data annotation may not achieve the expected effect during the training process of the reward model, the standard must be adjusted.
The change of standards means that the conclusion of the last alignment meeting is invalid and the standards have to be aligned again.
"Redundant and efficient, talking nonsense very efficiently every day." Yang Xiaoyun complained.
3. High-achieving students who are exploited by big factories
**The common characteristic of these interns is a high degree of education. The recruitment requirement is a bachelor's degree or above, but many interns have a master's degree. **
Many of them are educated by top universities in China and even the world. Yang Xiaoyun was surrounded by students from Peking University and Imperial College London, and the interns next to Chenxi's workstation were from Nankai University and the University of Electronic Science and Technology of China; Ding Xiaoyu was clearly informed during the training that the academic qualifications of interns were screened. "He (the interviewer) said that highly educated college students like us can learn things quickly and get started easily."
**Managing a group of smart people is never easy. Because these people can easily discover the essence of their work from repeated actions, and then question whether this job is really valuable to their future. **
Ding Xiaoyu described his work as "of little value and very internally draining."
When she comes to her workstation every morning, she opens the display screen and notebook, and uses the notebook to check the rules while writing answers on the display screen. Ding Xiaoyu can clearly feel that the detailed rules and procedures make her gradually lose the space to think, and she Discipline becomes a machine. “If you don’t learn something, and you don’t have the energy to learn other things, you will slowly lose your motivation to learn and your enthusiasm for doing other things.”
Ding Xiaoyu has also worked in the desensitization team, but the actual work has no fundamental connection with the word "desensitization." He just uses different chatbots and the company's internal beta products to answer the same questions, and compares and scores the answers. After only working for a few days, she was transferred to the text proofreading team. What she had to do was to correct errors that occurred when converting PDF format to Word format, mainly typos and punctuation marks. In a process she described as "near breakdown," she completed 25 pages of medical-related error correction tasks every day.
During the interview process, the interviewer asked Ding Xiaoyu if he could accept a boring and repetitive job. "My answer at the time was that it was acceptable. I think all the candidates' answers should be acceptable." Because he only had one internship experience as an undergraduate, and with the expectation of accumulating more internships and experiencing big companies, even with doubts, Ding Xiaoyu still chose to join the company.
In just two months, Ding Xiaoyu has been regarded as the person who persisted to the end among the interns in the same period. She witnessed many interns come in with high ambitions and then leave with their heads down.
Anthropologist David Graeber defines bullshit jobs as jobs that have no meaning or purpose. Jobs that should be eliminated by machine automation continue to exist because of window dressing, to please superiors, and to fill system loopholes. . Data annotation is like a variation on bullshit jobs that are often thought to have been replaced by machines, but still require humans to do them.
When the artificial intelligence craze arrives, people often hear the expectation that AI can replace humans in completing repetitive and boring tasks, thereby allowing humans to have more time and energy to pursue more creative and fulfilling work.
But it is also possible that artificial intelligence, like labor-saving technologies in the past, such as telephones and typewriters, overcomes the pain of information transmission and handwriting, but also creates a large amount of communication and paperwork that requires new artificial intelligence to perform it. Management, such as front desk, clerk. AI may not replace humans, but it will create more tedious, boring, and isolating jobs.
**In addition to being unable to gain recognition of the value of their work, the salary they receive may not allow these top students to achieve "price recognition." **
According to "Jiazi Guangnian", these data show that the salary of interns is not high. If they are located in a first-tier city, the salary of most artificial intelligence interns is 150 yuan/day, with room allowance and free canteen; if they are located in a second-tier city, only 100 yuan/day is left, and the room allowance is also reduced by one-third. 2. The meal supplement of 20 yuan replaces the free meal.
Like Ding Xiaoyu's internship in a second-tier city, because the office is located in the center of the city and the area is prosperous, a takeaway meal can easily exceed the 20 yuan meal subsidy standard, and basically requires repaying the internship salary.
Because most of them are just basic annotators for training large models, they may be uniformly assigned to positions that have nothing to do with their profession. They may also be transferred to different departments at any time and are required to get started quickly after a short training.
**Ding Xiaoyu described them as batches of interns who were taken advantage of by large factories. **
Chen Xi clearly felt that she was not the only one who felt the gap between expectations and actual work. "To put it bluntly, I feel that this job is not suitable for me. Sometimes when chatting, I will find that other interns may have 985 bachelor's degrees, and some have returned from overseas with master's degrees. The gap between them is also very, very big."
Yang Xiaoyun expressed it more directly: "It may be an inappropriate metaphor. My mother went to high school, so she can do this job."
**4. "We are actually assembly line workers" **
The person in charge of the Stardust Data Product Department said: "As the basic capabilities of the large model are completed and the development process begins to become more vertical and complex, the tasks will gradually change, requiring tools and personnel to be updated and iterated accordingly. However, Large models are still in the early stages of development, and market demand for annotators varies depending on the task. Compared with CV (Computer Vision) projects, NLP (Natural Language Processing) annotators have higher requirements for understanding ability, The requirements for professional terminology and domain knowledge are higher, and accurate and reliable corpus must be provided.”
The person in charge said that the problems posed by large models to data annotation are more reflected in the top-level design. For each data annotation task, how to understand the customer's application scenario demands, design a set of solutions such as data selection, data distribution design, and pipeline design that can be implemented efficiently and at low cost, and how to improve the efficiency and capabilities of platform tools are key A bigger challenge.
This relies on the participation of vertical domain experts as senior annotators, injecting domain expertise and experience into the design of the solution, and even participating in the iteration process of data quality inspection.
Zhang Ziqian, head of operations at data solution provider Besai Technology, said bluntly that currently, in terms of training large-scale models, there is no obvious difference in work difficulty and hourly wages between basic annotators and annotators who were previously engaged in frame selection. **When fine-tuning large models and creating solutions in vertical fields for customers, the biggest problem is how to build high-quality data sets, which requires labeling experts in professional fields such as IT, medicine, and finance. Such talents are still Scarcity.
OpenAI invested dozens of doctoral students in the guidance and review of data annotation, and outsourced basic data annotation to data annotation companies, scattered in low-income areas such as Africa and India. **The ones who really make a difference are those senior annotators, who only account for a small proportion. **
By comparing the job descriptions of annotators recruited by Baidu at its Beijing headquarters and Haikou data annotation base, we can see that they are also for training large models. The former is a senior annotator responsible for guidance, training and review, while the latter is a basic data annotator. , the two have vastly different salary levels.
**In other words, those higher-level senior annotators are actually the key talents for large model training. Their work is more technical and valuable, and the labor cost is also higher. **
**In contrast, even if these interns from prestigious schools come to train large models, at this stage, they are essentially the same as those data annotators in the past. **
**Interns often joke among themselves that they are not working in a big factory, but at Internet Foxconn, and they are workers on the assembly line. They can neither see where their work results will ultimately lead, nor can they create a horizontal chain of meaning with the people around them. **
This "Internet Foxconn" joke refers not only to the work of these interns, but also to the workload and management model, which is almost on par with the factory assembly line.
The amount of work that interns have to complete every day has a prescribed human efficiency red line. For Yang Xiaoyun, she needs to mark 32 questions a day. If the red line is not met, she has to report the reasons or work overtime to finish it. The prerequisite for completing the work is the constantly changing standards of the Lazi Association and the continuous collection of information.
In order to complete model training as quickly as possible, the annotation team faces high-pressure management. Yang Xiaoyun's group is not allowed to talk during working hours. The price of a few small talk may be added to the workload. If you fail to complete the work, you will be frantically reminded in the group. Even if you are sick and ask for leave, you may be interrupted by an urgent call from your regular employee.
In addition, in order to ensure that the data is not leaked, the exchange of data annotations across groups is expressly prohibited. Even if interns from different groups are placed in close proximity, they cannot discuss work content. None of these interns know how many subdivided groups there are in the company's data labeling and how many interns there are. A group may have 10, 40, 50, 60 people, or hundreds of people on each floor.
Under the high-pressure human efficiency red line, Yang Xiaoyun can only be temporarily "happy" when encountering prohibited questions. Because content involving violence, pornography, and gore must be removed directly, but it can still be counted in personal work items. "It's equivalent to tightening a bad screw. You will only be happy that you don't have to tighten the screw." During the division of labor in the morning, the interns even competed with each other to get the prohibited items.
After Yang Xiaoyun left her job early, she often visited the Moments of interns who were still meeting at the company at 10 o'clock in the evening, or even at 12 o'clock. There are also interns who send her voice messages, crying, but because they have rented a house and have no way to leave, if they can't persist, it means that all the rent will be wasted.
5. There will never be a shortage of people here
Li Zhuxi is one of the rare interns with data annotation experience. She studied cognitive linguistics. She explained that the direction of combining linguistics with neurology, observing brain imaging, including establishing brain-computer interfaces, has a certain connection with artificial intelligence.
Before coming to this big factory, she had done data annotation for large language models in another big factory, and that was before the release of ChatGPT. In Li Zhuxi’s impression, after ChatGPT came out of the circle, similar data annotation internships sprung up like mushrooms after a rain.
She successfully completed the three-month internship, even though she described it as a "relatively mechanical and not very difficult" job. Li Zhuxi described that he pays more attention to experience, "I don't expect this job to be interesting. It's still good to experience it. I not only gain internship experience in a large factory, but also experience the unique corporate culture here."
For Zhao Shuo, a liberal arts student from a school in Shuangfei, the artificial intelligence editing internship position in a large factory has been his upper-level choice.
When looking for a summer internship, he actually preferred an operational position in a research institute. The research institute is a public institution and has a staff, which was very attractive to Zhao Shuo. "At that time, I was particularly looking forward to the feedback it could give me." . But in the end, the institute did not choose Zhao Shuo, who was a first-year graduate student, and recruited a higher-grade student.
There are people who are more "curly".
In Zhao Shuo's eyes, some interns will work particularly hard and take on more tasks in order to seek opportunities to become regular employees. A serious attitude and diligent attitude will win the favor of full-time employees. "Leaders often have certain exchanges with them and will also give them some management authorization to manage interns."
The company even selects interns with outstanding performance every week and posts their photos on the wall as recognition, but there is no necessarily bonus incentive, and there is none in Zhao Shuo's business line.
Jia Yuhang, general manager of Yunmei Data, told Jiaziguangnian that there are two main promotion routes for data annotators: one is the expert route. After mastering relevant skills in specific vertical fields, junior annotators can gradually become senior annotation experts. ;The other is the management route, becoming the manager of the project.
But Zhao Shuo would not choose to stay. After one year of graduate school, Zhao Shuo clearly realized that his expectations for future work had dropped. Feeling the increasing changes in the general environment and observing the dissatisfaction of students who chose employment after graduation, Zhao Shuo's previously expected "high-end, sophisticated" and "irreplaceable" jobs were gradually replaced by a stable job. As a liberal arts student, he is anxious that he has not yet mastered irreplaceable skills, and hopes to find a job that is managed within the establishment.
When chatting, the interns would lament to each other that the work they are doing may soon be replaced by machines, and manual data feeding will no longer be needed.
For Jia Yuhang, general manager of Cloud Measurement Data, similar concerns do not exist. With the actual mass production of algorithms and the enhancement of data closed-loop capabilities, the overall amount of labeled data and the amount of manual data labeling are still increasing year by year. In the past, it was 100% manual annotation, but now there is a certain proportion of manual annotation, automatic annotation, and manual verification. In the future, the proportion of automatic labeling may become larger and larger. However, although the proportion of manual annotation is decreasing, with the gradual development of the artificial intelligence industry and the increasing amount of data, the amount of manual annotation will continue to increase.
After leaving her job early, Yang Xiaoyun found a game planning internship that she liked. The working atmosphere there was relaxed and she felt more rewarding. Artificial intelligence editing was an "unlucky" internship experience for her. For Ding Xiaoyu, it was a process of disenchantment. Even if she went to an internship in a large factory that she had been looking forward to, she would still face countless boring jobs. She felt that this might be because her abilities were not strong enough or there were too few opportunities for experience. .
But there will never be a shortage of people there.
Yang Xiaoyun heard that after she left, the team expanded from dozens to hundreds within a month. Ding Xiaoyu discovered that every 10 days, a new batch of interns would come, each batch consisting of twenty or thirty people.
“You may go away cursing and telling the world how bad your job is, but there will be a steady stream of new people coming in to fill your spot.”
*At the request of the interviewees, the characters Chen Xi, Yang Xiaoyun, Ding Xiaoyu, Li Zhuxi, and Zhao Shuo in the article are pseudonyms.