Fighting AI with AI, the “evolution theory” of large model security

巴比特_

2023-09-14 05:07:28

Text丨Liu Yuqi

Editor｜Wang Yisu

Source丨Light Cone Intelligence

“Are we more dangerous or safer in the Internet age?”

In 2016, when the Internet was developing at a rapid pace, these big characters and two lines of slogans often appeared in elevator advertisements. From viruses and Trojans to online fraud, the thinking of security and the establishment of security prevention technologies have been racing against the development of science and technology. Likewise, the early days of the big model era also gave rise to many security considerations.

Ten years after the invention of the Internet, Internet protection technology and the industrial chain began to be completed. Based on the experience of the past few decades, in less than half a year after the birth of the big model, around model security and data security, , content security discussions have been endless.

In the past week, at the Shanghai Bund Conference, Pujiang Innovation Forum, National Cyber Security Week and other occasions, industry, academia and research communities have focused on the data security issues (including data poisoning, information leakage, copyright risks, etc.) caused by the implementation of large model applications, model A series of discussions were held on security issues (security vulnerabilities in the model itself, malicious exploitation, etc.), content security issues (generated content containing sensitive information such as violations, illegality, pornography, etc.), AI ethics issues, etc.

How to protect large models?

Some domestic security manufacturers, such as 360, Ant, Sangfor, Qi’anxin, Shanshi Technology, etc., are actively developing large-model security technologies.

Large models need "doctor" and "bodyguard"

The birth of a large model as a new species requires safety monitoring during the training process. When the large model is finally introduced to the market, it also needs a "quality inspection". After quality inspection, it enters the market and needs to be used in a controllable manner. This is all It is a macro approach to solving security problems.

Whether it is a general large model or an industry large model for vertical fields, currently, model security protection is mainly divided into three parts:

The first is the data problem in the training phase: if the data is improperly collected, biased or mislabeled, or the data is poisoned, it may cause the large model to produce erroneous output, discrimination or other negative effects. At the same time, the data will also be affected during the application process. Facing risks such as data leakage and privacy exposure;

The second is the controllability problem of the model itself: the reliability, stability, robustness, etc. of the model need to be tested. For example, users have previously constructed targeted statements to induce the model, and large models may produce information with fraud, discrimination, and politics. Tendency and other risk content;

The third is the security issue of the application of large models in actual scenarios: During actual use, the interactions and applications of different user groups need to be carefully evaluated, especially in fields such as finance and medical care, which have extremely high requirements on the correctness of model output. If If used improperly, one stone can easily cause thousands of waves.

Many industry insiders told Guangcone Intelligence: "Model safety requires an integrated technical protection system, and the control of one link alone cannot solve the fundamental problem."

Referring to the development path of Internet security, many "virus detection and killing" software companies have been born. Generally, detecting and locating problems is often the first step.

Lightcone Intelligence learned that Ant’s “Yitianjian” includes the large-model security detection platform “Yitianjian 2.0” and the large-model risk defense platform “Tianjian”, covering the entire chain from detection to governance to defense. Antjian 2.0 can perform multi-dimensional security scans on large models to check for existing data security risks, content risk points and other issues. It is equivalent to standing in the perspective of "black industry" and using intelligent attack and confrontation technology to automatically generate millions of inductive questions, conduct inductive questions and answers on the large generative model, and find out the weaknesses and loopholes of the large model.

From a technical perspective, Yijian adopts the latest "adversarial intelligence" technology route, using intelligent adversarial technology to continuously "project questions" to large models, observe the answers generated by the model, and determine whether there are risks. Through continuous "torture", just like a doctor asking a patient's symptoms multiple times, the platform can interrogate and analyze the health status of the large model.

It has become a mainstream technology trend to improve the security of large models by generating adversarial samples and developing algorithm systems for detecting adversarial samples. In the industry, giant companies such as OpenAI, Google, Microsoft, and NVIDIA have applied counter-intelligence technology to their products and services.

For example, under this technical idea, the CleverHans system developed by the University of Toronto is like a "thief" specially designed to test the anti-theft system. It will deliberately add some small interference to try to deceive the AI security system. Under normal circumstances, the AI system can accurately identify the picture of a "kitten", but the CleverHan system has to slightly modify a few pixels on the picture of a "kitten" to give the AI the illusion that it is a picture of a puppy. If the AI system is fooled, it means there is a security vulnerability.

Compared with detection and "diagnosis", "prevention and treatment" are also very important. Ant Tianjian is like a smart shield that can prevent problems before they occur. By intelligently analyzing the intention of users to ask questions for defense, Tianjian can intercept certain malicious questions that try to induce the model to generate sensitive content, ensuring that external malicious induction cannot be introduced into the large model. At the same time, secondary filtering is implemented on the model output content to automatically identify risk information and intervene to ensure that the content output by the large model complies with specifications.

More importantly, data issues are the source of model security. Shi Lin, director of the Institute of Cloud Computing and Big Data of the China Academy of Information and Communications Technology, once shared at an academic exchange meeting: "Many security vendors have now adopted security measures, including We will do some cleaning of the training data, filter the input and output content, and also take security prevention and control measures such as monitoring and identification.”

This requires the defense platform to act at the data source to address issues such as toxic data sources and uncontrollable model depth black boxes. Zhu Huijia, Director of Content Algorithms of Ant Group’s Big Security Machine Intelligence Department, said that Tianjian is currently trying to ensure model security through data detoxification, alignment training, and interpretability research.

Use magic to defeat magic, AI to fight AI

The content characteristics in the digital world and the world with human eyes are different.

With the advent of the era of large models, its powerful capabilities have also provided new ideas for the transformation of security protection technology. “Using the power of AI to fight AI” has become a hot topic.

In fact, adversarial attack and defense ideas are not exclusive to model security. As early as the last decade, in the face of various security threats, the field of artificial intelligence has gradually formed the security concept of "attack, test and defend - attack to promote defense - attack and defense integration", and continues to explore by simulating various attack scenarios. Weaknesses in models and systems are used to promote the strengthening of defense capabilities on the algorithm and engineering side.

However, in the past, security protection mainly relied on machine learning algorithm models, which required the accumulation of a large amount of professional data knowledge, and faced the problems of knowledge blind spots and untimely cold start of small samples. Using large model technology, more intelligent security prevention and control can be achieved.

This is reflected in several aspects. First, large models can provide intelligent security “consultants”. Large models pre-trained based on massive texts can become excellent “consultants” and propose appropriate analysis and defense strategies. For example, through simple natural language description, the security situation can be quickly analyzed, suggestions for countermeasures can be made, and the security team can be assisted in planning solutions. This is similar to a smart security "little assistant".

Judging from the current situation in the industry, there is still a lack of a set of easy-to-use and standardized evaluation tools and rules for how to evaluate the safety of AI.

This is also another aspect that can be supplemented in large model defense. It uses large model technology to learn risk knowledge and standard rules to improve AI's cognitive understanding of risks, so as to achieve extremely fast defense and rapid cold start by using large models against large models. the goal of.

Large model security requires both "fast" and "slow". These two logics are not contradictory. In terms of large model security defense, we need to be "fast" and be able to quickly detect and kill viruses to ensure that the service is poison-free. This includes several key defenses such as "data detoxification", "safety guardrails" and "AIGC risk detection". In terms of security and trustworthiness of large models, we need to be "slow" and ensure the controllability and trustworthiness of the entire system environment in a long-term and systematic way. This includes "security evaluation", "deconstruction and controllability", " Co-governance of human society" and other aspects.

Taking text security as an example, large models can be trained based on security standard rules, risk domain knowledge and historical risk samples to improve the model's understanding of risk standards and content, thereby improving risk detection capabilities. It also uses large model generation capabilities combined with security knowledge graphs to construct attack samples and continuously iteratively optimize the detection model.

A security expert said: "Compared with the limited samples collected manually, the massive and diverse samples generated by large models will make the security detection model 'well-informed' and adapt to new threat methods faster."

This technology has also been used by Ant in AIGC content detection. Zhu Huijia mentioned: "AIGC deep forgery detection also adopts the idea of attacking, testing and defending, and using attack to promote defense. It generates through different methods, different styles, and different generation models, and establishes nearly tens of millions of deep forgery data to train the model. Quickly distinguish whether content is machine-generated or artificially generated, thereby achieving a detection model with better generalization and robustness.”

In response to the problems caused by AIGC during its application, some leading companies in the world have begun to make plans.

OpenAI has previously stated that it is considering adding digital watermark technology to ChatGPT to reduce the negative impact of model abuse; Google stated at this year’s developer conference that it will ensure that every AI-generated image of the company has an embedded watermark; this year In early January, Nvidia also released a software called FakeCatcher to find out whether the faces in the video are deep fakes.

Looking back at the history of the development of the Internet, chaos and rapid development are often "twin brothers". It was after the industrialization of network security matured that the Internet truly ushered in the application of a hundred flowers.

Similarly, model security is not just the task of a certain security manufacturer, but only when security technology forms a trusted fence can large model technology really "fly into the homes of ordinary people."

"Large models are very complex issues. The complexity of ethics, data, training and other fields is unprecedented. It is a new field and a proposition before everyone. Ant's 'Yitianjian' from the perspective of large model security We have done some exploration on it, but there are still many problems to be researched and solved, such as the authenticity and accuracy of the answers. It also needs continuous iteration and improvement, and requires the joint efforts of the whole society." Zhu Huijia finally said.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes

Reward
1
Comment
Repost
Share

Comment

0/400

No comments

巴比特_

Trending TopicsView More
#Fed Rate Cut Ahead
18.3K Popularity
#Funny Moments In Crypto
28.7K Popularity
#My Pick In RWA
33.8K Popularity
#Crypto Market Rebound
271.2K Popularity
#Gate Alpha Treasure Hunt Phase 3
821 Popularity

Sitemap