🌕 Gate Square · Creator Incentive Program Day 8 Topic– #XRP ETF Goes Live# !
Share trending topic posts, and split $5,000 in prizes! 🎁
👉 Check details & join: https://www.gate.com/campaigns/1953
💝 New users: Post for the first time and complete the interaction tasks to share $600 newcomer pool!
🔥 Day 8 Hot Topic: XRP ETF Goes Live
REX-Osprey XRP ETF (XRPR) to Launch This Week! XRPR will be the first spot ETF tracking the performance of the world’s third-largest cryptocurrency, XRP, launched by REX-Osprey (also the team behind SSK). According to Bloomberg Senior ETF Analyst Eric Balchunas,
Patronus AI: Lightspeed America leads an investment of US$3 million, targeting the enterprise market to solve large model security issues
**Source: **SenseAI Deep Thought Circle
Sense thinking
We try to put forward more divergent deductions and reflections based on the content of the article, and welcome exchanges.
▪ Pain points in the application of enterprise-level large models: The prediction of the following using the transformer autoregressive formula is essentially a probabilistic model, and the assessment of the uncertainty of the generated content is the key to model capability verification. At the same time, academic index evaluation cannot be adapted to enterprise-level field applications, and a more product-oriented multi-model automatic evaluation platform is needed.
▪ How to balance accuracy and uncertainty in production content and amplify LLM capabilities to business demand scenarios is the art of model evaluation platforms and enterprise-level Gen-AI applications.
This article has a total of 2115 words. It takes about 5 minutes to read carefully.
Users are adopting generative AI at an unprecedented rate. ChatGPT is the fastest-growing consumer product ever: attracting more than 100 million users within the first two months of launch. AI has been in the spotlight this year. But at the same time, enterprises have shown a cautious attitude when faced with the rapid deployment of AI products. They worry about the errors that large language models can cause. Unfortunately, current efforts to evaluate and inspect language models are difficult to scale and inefficient. Patronus is committed to changing that, and their mission is to increase enterprise confidence in generative AI.
Founding background of Patronus AI
The two founders of Patronus, Rebecca and Anand, have known each other for nearly 10 years. After studying computer science together at the University of Chicago, Rebecca joined Meta AI (FAIR) to lead NLP and ALGN-related research, while Anand developed early causal inference and experimental foundations at Meta Reality Labs. At Meta, the two experienced firsthand the difficulty of evaluating and interpreting machine learning output—Rebecca from a research perspective and Anand from an application perspective.
When OpenAI CTO Meera Murati announced the release of ChatGPT on Twitter last November, Anand forwarded the news to Rebecca within 5 minutes. They realize that this is a transformational moment, and companies will definitely quickly apply language models to various scenarios. So Anand was surprised when he heard that Piper Sandler, the investment bank where his brother worked, had banned internal access to OpenAI. Over the next few months, they heard multiple times that traditional companies were moving forward with this technology very cautiously.
They realized that although NLP technology has made significant progress, it is still far from real enterprise applications. Everyone agrees that generative AI is very useful, but no one knows how to use it in the right way. They recognize that AI assessment and safety will be top issues in the coming years.
Team and financing situation
Patronus announced on September 14, 23, that it had received US$3 million in seed round financing from Lightspeed Venture Partners. Factorial Capital, Replit CEO Amjad Masad, Gokul Rajaram, Michael Callahan, Prasanna Gopalakrishnan, Suja Chandrasekaran, etc. also participated in the investment. These investors have extensive experience investing in and operating benchmark companies in enterprise security and AI.
The founding team of Patronus comes from top ML (machine learning) application and research backgrounds, including Facebook AI Research (FAIR), Airbnb, Meta Reality Labs and quantitative institutions. They have published NLP research papers at top AI conferences (NeurIPS, EMNLP, ACL), designed and launched Airbnb’s first conversational AI assistant, pioneered causal inference at Meta Reality Labs, and exited the Mark Cuban-backed Quantitative hedge fund exits 0→1 products in fast-growing startups.
Patronus is advised by Douwe Kiela, CEO of Contextual AI and adjunct professor at Stanford University, who is also the former director of research at HuggingFace. Douwe has done pioneering research in the field of NLP, especially in evaluation, benchmarking and RAG.
Problems Patronus AI solves
Current large language model evaluation is not scalable and performs poorly for the following reasons:
Manual evaluation is slow and costly. Large enterprises spend millions of dollars hiring thousands of internal testers and external consultants to manually check for bugs in AI. Engineers who want to deploy AI products spend weeks manually creating test sets and checking AI output.
The non-deterministic nature of large language models makes predicting failures difficult. Large language models are probabilistic systems. Since its input range is not limited (within the context length limit), it provides a wide attack surface. Therefore, the cause of failure will be very complex.
There is currently no standard testing framework for large language models. Software testing has been deeply integrated into traditional engineering workflows, with unit testing frameworks, large quality inspection teams, and release cycles, but companies have not yet developed similar processes for large language models. Continuous and scalable evaluation, identification and documentation of large language model errors, and performance benchmarking are critical to the production use of large language models.
Academic benchmarks do not reflect real-world situations. Enterprises currently test large language models on academic benchmarks (such as HELM, GLUE, SuperGLUE, etc.), but these benchmarks cannot reflect real usage scenarios. Academic benchmarks tend to be saturated and suffer from training data leakage issues.
The long tail of AI failure is very serious, and the last 20% is extremely challenging. Adversarial attacks have shown that the security problem of large language models is far from solved. Even if general-purpose pre-trained language models demonstrate strong basic capabilities, there are still a large number of unknown failure situations. Patronus has done a lot of groundbreaking research on adversarial model evaluation and robustness, but this is just the beginning.
Patronus AI’s Mission
Patronus AI's mission is to increase enterprise confidence in generative AI.
Patronus AI is the industry’s first automated evaluation and security platform for large language models. Customers use Patronus AI to detect large language model errors at scale to safely deploy AI products.
The platform automatically performs:
Scoring: Evaluate model performance and key metrics such as hallucination and safety in real-world scenarios.
Generate tests: Automatically generate large-scale adversarial test sets.
Benchmarking: Compare models to help customers determine the best model for a specific use case.
Patronus expects frequent evaluations to adapt to continually updated models, data, and user needs. The ultimate goal is to obtain a credibility mark. No company wants to see their users dissatisfied with unexpected failures, or even negative press and regulatory issues.
In addition, Patronus is looking for trusted third-party evaluators where users need an unbiased, independent perspective. Patronus wants everyone to think of it as the Moody's of AI.
Patronus' current partners include leading AI companies Cohere, Nomic and Naologic. In addition, well-known traditional industry companies such as several financial services companies are also in talks with Patronus AI to conduct pilot projects.
Do not go gentle into that good night,
Rage,
rage against the dying of the light.
—— Dylan Thomas (1954)
References