Midjourney CEO: AI should be an extension of ourselves

Tencent Technology News On July 7, Midjourney CEO David Holz made a speech at the 2023 World Artificial Intelligence Conference, arguing that AI will become a new carrier and engine of creativity and imagination. Through AI, we have the potential to amplify the raw imagination of the entire human race. Regarding the company's name Midjouney, Holz stated that it comes from the concept of the middle way in the Taoist book "Zhuang Zhou". He believes that Chinese classical literature has brought many of the most beautiful and deepest thoughts.

At present, Midjouney is developing version 5.3, and will provide a series of zooming and panning capabilities to automatically generate new images related to different angles in version 6, and can control the randomness of the generated images, allowing the author to be weird Find a balance between the beauty and the bewildering images. In the future, Midjourney aims to develop three-dimensional, real-time, dynamically adjustable generated images.

Regarding the future of technology, he is not sure where it might go. But the fusion model (image/text model for fusion) may be a more likely development direction. He believes that the potential of AI's technological progress has not been fully realized, and it is ten times stronger than it is now, and a hundred times of progress is inevitable.

He believes that most of the progress in technology so far has come from trying to make people better, trying to amplify human capabilities. Therefore, AGI may not be necessary. As an extension of our human beings, AI is a better choice to empower human beings.

The following is the transcript of the speech:

Hello everyone, I am David Holz, CEO and Founder of Midjourney. I am honored to be invited by the Shanghai Municipal Government to participate in the World Conference on Artificial Intelligence and look forward to joining today's event.

One of the most important technologies in the world is the engine. An engine is a machine used to generate, transfer or amplify. We use engines to build all kinds of vehicles such as cars, planes and boats in various factories. And now, it’s time to think of AI as a new kind of engine.

At MidJourney, we are trying to use this engine to create a new type of vehicle, which is not a vehicle, but a vehicle that carries our thinking and imagination.

Like you can turn the world around with a soccer ball, but still need legs to kick it. We hope to create a new type of vehicle that you can use to imagine, not just generate movement. Before we can create, we must first imagine what we can be, where we can go, what is possible. I think the tools we make, more than anything else, are focused on amplifying the primordial power of imagination. We have the opportunity to amplify not just any individual individual, but the imagination of the entire human race. I have visited China many times with Leap Motion (a gesture recognition device), and Leap Motion's first office is in Shanghai. Shanghai has a special feel that I like very much, it seems to be a combination of San Francisco, Los Angeles, New York and some old European cities. It has the strength of an ancient history and culture, but also has a sense of the unrefined future. It's really cool, and it's two of my favorite things.

In fact, I'm basically an avid reader of science fiction, and the craziest settings I've seen come from Chinese classics. I think ancient Chinese literature has the most beautiful and profound thoughts in human history. The name MidJourney actually comes from a translation of one of my favorite ancient Taoist texts, from Zhuang Zhou. For example, "Zhuang Zhou's Dream of a Butterfly", "Zi Fei Yu", "Paod Ding Jie Niu", "Wood of Unworthy Wood", "Empty Boat", I like these. What I like about the name MidJourney is that I think people tend to forget the past at times and can feel lost and uncertain about the future. But I feel more that we are actually on a half-way journey, we come from a rich and beautiful past, and we have a wild and incredible future ahead.

We recently released version 5.2 of Mid Journey and are currently working on version 5.3. **Afterwards I hope to release a major update, which I hope will be called version 6. The latest feature we've introduced is all about image scaling, and as you zoom out, you can create different stories and environments that change around a central theme. This week we're releasing a similar feature that allows you to move the camera around, and then as you move the camera sideways, you can keep changing the cue, and then tell the story, and we're releasing this fancy control system that combines these new features to better Fine control over image generation.

You can also combine this with style controls. "Style control" is a little bit confusing, but the idea is that you want to tell the AI how beautiful you want to generate it, and how much risk you take to create that beauty. Even if it's unconventional, messy, and weird, sometimes the results are truly remarkable.

Sometimes you need to be adventurous, and this allows one to control the balance between risk and the randomness of beauty, or how much attention is paid to the general general beauty of an image. We've also introduced something we call turbo mode. Turbo mode is where we use the GPU as much as possible, making the image generation very fast. This makes the generation 4 to 5 times faster. This mode makes it seem like you are using 64 or more than 100 GPUs to generate images. To achieve this computing power, your computer should be worth about 500,000 US dollars. That sounds kind of crazy, and we're working on even crazier ones. While most of them are still brewing, we think that over time Midjourney will evolve to create not only 2D images, but 3D images, motion images, and you can even interact with the pixels themselves. In the future maybe you will be able to reflow and reshape what you draw in real time.

One just needs such a massive AI processor, and then it can dream up all the different worlds, and the dreams can interact with our minds. And we're kind of dreaming through it (AI), and that's going to be really cool. The sequential discovery of the Diffusion model, Transformer model, and Clip model actually allowed AI to enter the image space. About 2 years ago, before any image AI service came out, all of our researchers were communicating in San Francisco. I remember saying that these models, especially the Diffusion model, would definitely bring something completely different. There is also generative confrontation network technology, which is the basic technology that everyone used to make image generation before.

I just remember everyone immediately nodding in an unusual way, saying that the Diffusion model was really different. It was a very serious moment, and I had a strong feeling that I had to get involved and bring a more human user interface to this technology.

But regarding the future, it is difficult to know for sure how the technology will develop. Sometimes we talk about how to turn the language model to the Diffusion model now, that is, use the Diffusion model to make text. Or the image model will become more like a language model. How is this achieved? The technical term for this approach is autoregressive Transformer, or AI will develop towards a hybrid model. But it's really hard to tell. I think we're only at the beginning of this change, but I'm 100% sure there's a lot of progress to be made. A ten-fold, a hundred-fold improvement is likely to be inevitable.

This advancement is not just in performance, but in user interfaces and products that allow us to better use these technologies. Both individually and collectively can make really cool stuff that solves problems better. Douglas Engelbart was the first person to create a text editor. Initially, computers were programmed by punching cards, or holes in cards. But Douglas started thinking about what would happen if we programmed computers, which sounded crazy at the time. His idea was that by programming computers on computers we could speed up this cycle, make what we do better, make computers more powerful, amplify everything. This idea finally came true. Even though we have these different cultures like AI, human-machine interface, intelligent application culture, I think most of the progress in technology so far has come from trying to make people better, trying to amplify human capabilities.

We haven't really seen the age of AI coming, where we'll have independent AI solving problems. But if we think too much about moving in that direction, we may miss a lot of the opportunities that exist in technology. I think about not only what AI can do, but how to create fluidity and entanglement between different things. Because a tool shouldn't feel like a person, it should feel like an extension of yourself, your body, your mind. I'm thinking about how to build these technologies where humans and AI intertwine so that it doesn't feel like you're collaborating with an artist, but more like you're imagining something and it's on screen. Many people describe my journey as if those destinations were part of their thinking. I think this is what most AI should be like, it should be an extension of ourselves.

So I want to say thank you again to Mr. Chen and to the entire audience. WAIC is pretty cool and I hope I can attend in person in the future and be a part of this event. I am looking forward to more cooperation with China, I remember all the wonderful personal experiences I had there, and I hope everyone can enjoy the fun of interacting there too. Thanks.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)