Stanford PhD leaves school to start a business, directly sets off a firestorm in the AI circle!
The new product aims at AI video generation and has just debuted to become an industry top flow, attracting many big shots to watch and comment.
The OpenAI big shot Andrej Karpathy forwarded it, attaching a long paragraph with passionate emotion:
Anyone can become the director of multi-modal dreams, like the architect of dreams in Inception.
The founder of Stability AI also came to like it:
The new product is called Pika 1.0, and the company Pika was established in April this year.
It should be noted that there are many similar products in this industry, such as Runway, a company established for 5 years.
Amidst the "riot of AI video generation attracts the eye", what exactly has this new product done to quickly break through the circle and attract a large amount of attention?
From the demo effect, Pika 1.0 not only can smoothly generate a video segment based on text and pictures, but the transition between static and dynamic is just in an instant:
The editability is also particularly strong, specifying any element in the video, it can be achieved with the sentence "change clothes" quickly:
Such an effect has also established the company in only half a year, with more than 520,000 users.
More is new harvesting US$55 million financing, personal investors including not only giants like Quora founder Adam D'Angelo, Perplexity CEO Aravind Srinivas, GitHub former CEO Nat Friedman and so on.
So is Pika really as good as it looks? We also tried it out right away.
What will the new AI video king look like?
Pika 1.0, the first official version product launched by Pika, became a viral hit this time.
After more than four months of testing on the Discord community, Pika thinks it's time to roll out this major upgrade.
Compared to the previous Pika which could only generate videos with text or images, today's Pika 1.0 has richer functionality -
It can now generate videos based on text, images or video styles, and can also edit parts of videos.
How editable is it?
The editing capability is much stronger not only the screen size can be arbitrarily expanded, from 5:2, 1:1 canvases, to 9:16 and 16: 9 ultra-wide screens, seamless conversion between 4 options:
Like various styles of filters such as 3D, anime, movies and so on, it's unnecessary to say more.
Most importantly, the official version 1.0 launched a more user-friendly web version, without needing to @ the robot repeatedly in Discord to get started directly.
(However, it still needs queuing now, needs some patience.)
Of course, you can also go to the Discord community to experience it first.
Although it has not yet been updated to Pika 1.0 on the web version, we also tested the effect of text and image generation videos, which is not bad.
After joining the community, go straight to "Creations" and enter any of the 10 generation areas from below at random to get started.
In the input box, output "/" and select the simplest "/create" command:
Here, we input "a robot dancing in the rain, sunset, 4k, -gs 8" prompt to the robot.
It only takes about half a minute, and the video comes out very fast:
The effect is that the rain is not very obvious, but the robot's physical dynamics are really strong.
Let's try a slightly longer prompt:
a teenager walks through the city streets, takes pictures of places
It's still super fast, and the result comes out:
Wow, this time I'm really satisfied, the scene matches my brain's prediction, and even better than I imagined.
In addition to pure text, we can also upload a reference image for creation, using the "/animate" command.
Boom, a static meme comes to life:
In summary, Pika gives video durations of around 3s (so using too long a prompt is useless, it will ignore the later part), and it cannot guarantee that the result will be satisfactory every time, but overall, trying a few times will still get some good results.
In addition to self-testing, let's also take a look at works by netizens, some of which are very impressive.
For example, someone created this cute little monster, clumsy and cute, making you want to reach out and touch it:
There is also the scene of two little girls playing musical instruments, after watching it I seem to really hear the beautiful voice coming from it:
The most amazing is this scene of white doves fluttering around a short-haired beauty:
Too atmospheric, right?
After seeing the above effects, let's also look into the background of this company.
Founded by two Chinese PhD graduates from Stanford
Pika's founders are two people, Demi Guo and Chenlin Meng, both PhD from Stanford.
According to The Information, Guo founded Pika in April this year, and then Chenlin Meng joined as a co-founder to develop this text-to-video generation model together.
From their academic backgrounds, they focus on NLP and computer vision directions of AI research respectively, and both have academic experience in generative AI.
Co-founder and CEO Demi Guo received her PhD from Stanford University's AI Lab (NLP & Graphics).
Born in the US, raised in Hangzhou, she attended Hangzhou Foreign Language School in middle school, and was exposed to programming from an early age, winning an IOI silver medal. She started studying abroad during her undergraduate years, being admitted early to Harvard University.
For this startup, her LinkedIn profile shows On Leave, presumably because she intends to focus on her startup first.
Prior to her PhD studies at Stanford, Guo Wenjing obtained a Master's degree in Computer Science and a Bachelor's degree in Mathematics from Harvard University.
During her undergraduate years, she took a gap year and worked full-time as a research engineer at Facebook AI Research.
During her time there, she participated in research using Transformers to analyze 250 million protein sequences. This paper now has over 1,200 citations, including the later popular AlphaFold2:
In addition, she has interned at various companies including Epic Games, Google and Microsoft.
For this startup, Guo Wenjing's advisor Christopher D Manning has provided considerable support.
Christopher D Manning is well known for his NLP research. His Google Scholar citations now exceed 230,000. He will also serve as one of the academic advisors for Pika.
Co-founder and CTO Chenlin Meng also has a PhD in Computer Science from Stanford.
Compared to Guo Wenjing's research experience in NLP, her academic background in computer vision and 3D vision is more extensive. She participated in research on denoising diffusion implicit models (DDIM). That paper now has over 1,700 citations:
In addition, she has published multiple papers on generative AI at top conferences like ICLR, NeurIPS, CVPR and ICML. Many were selected for oral presentations.
Of course, with Pika 1.0 taking off, Pika has also started further recruitment across technology, product and operations.
5 product launches in one month
It's worth noting that it's not only the rapidly expanding Pika.
For the AI video industry overall, there has been an "outbreak period" recently.
By an incomplete count, from November until now in just one month, 5 AI video generation products have launched or received major updates:
First, on November 3rd, Runway's Gen-2 had its milestone update with support for 4K ultra-realistic clarity.
Then on November 16th, Meta launched Emu Video. User evaluations suggest it beats rivals like Gen-2 and Pika, with effects like:
Starting from Emu, it was like a competition erupted, with companies scrambling to keep up.
Just two days later on November 18th, Bytedance suddenly launched PixelDance. The dynamism of the visuals is unprecedented, with large amplitudes of motion and no deformation, making quite an impression.
3 days later on November 21st, AI leader Stable AI also finally launched their video tool: Stable Video Diffusion.
The effects are also quite competitive.
The same day, Gen-2 also didn't sit idle and launched motion paintbrushes for localized control, marking an important milestone for the controllability of generative models.
Finally, today on November 29th, the startup Pika officially launched the 1.0 version of their web interface, rivaling the "big brother" Runway directly.
On top of that, we have never seen so many uniquely featured products from different backgrounds competing to launch around the same time.
It makes one exclaim:
Has AI video arrived on the eve of an outbreak?!
Please share any thoughts!
How to use AI Video Models All in One Platform?
Are you excited about these latest AI video technologies? Are you already raring to give them a try? Unfortunately, PIKA AI is still in the Waitlist phase. But don't worry, as soon as they release the API, Anakin will integrate it at the first moment!
Anakin is an all-in-one AI platform where you can experience the latest models like GPT-4-Turbo and Dalle-3. You can have textual conversations and generate beautiful images without mastering complex AI painting skills.
Welcome to register and try it out!
from Anakin Blog http://anakin.ai/blog/pika-another-powerful-ai-video-generation-tool-available-online/
via IFTTT
No comments:
Post a Comment