2020-04-21Back to list
The ability to edit videos quickly is a skill standing out in the era of social media where short videos are king, but becoming a master requires a great mix of both creativity and proficiency. If you don’t want to sit in front of the computer spending hours editing clips, artificial intelligence can help you out.
VidPress is an AI-powered video synthesis tool Baidu Research recently developed in an effort to churn out sleek, professional video content in one click. Given a URL as input, VidPress can automate the creation process from choosing clips that fit the topic to knitting video content with AI-synthesized narratives.
Deployed on Baidu’s short video app Haokan, VidPress has provided a major boost in terms of quantity and quality. VidPress can produce over 1,000 news videos per day, compared to 300-500 videos previously from human editors, with an average 15 percent increase in video completion rate (the percentage of all videos that play through their entire duration to completion).
How does artificial intelligence edit videos?
Imagine how difficult it is for AI to edit videos. In essence, it must understand the story first, tailor it into a short script that fits the video length and synthesize a narrative, find relevant clips from footages and put them together in the timeline, and output a video aligned with the audio.
Considering the wide range of multimodal data involved, Baidu researchers have applied multiple techniques to the pipeline, including computer vision, natural language understanding (NLU), and speech synthesis.
VidPress first inputs a URL, analyzes the web page using NLU models to help find matched media content, then enriches the story by aggregating relevant news from a wide range of sites.
An appealing video needs both narratives and visual components. For narratives, VidPress uses multiple NLU models to create a short and fluent summary of the longer story, then converts the summary into a synthesized speech by using Baidu’s text-to-speech services.
To create video content, VidPress finds the right images and video clips from both the web page and relevant news as well as an established media library and Baidu’s search engine, then cuts and chooses clips that fit the topics by analyzing the semantics of these clips using computer vision techniques like facial recognition, object detection, optical character recognition, and video understanding.
The critical step is to drag video clips into locations that match the audio timeline. Using a self-developed attention-based timeline alignment algorithm, VidPress can segment a chunk of text into meaningful anchors, rank clips by their relevance to the anchors, and move high-ranked clips into the timeline first. The last step is to render the timeline into a video file.
VidPress takes up to nine minutes to create a news video, with an average of 2.5 minutes for a two-minute 720p video, compared to 15 minutes by human editors.
The promising future of VidPress
From a laboratory project to real-world practice, VidPress needs to scale up. The Baidu Research team took the next step by developing a distributed video synthesis system and associated REST APIs to provide web services for Baidu’s Haokan.
The effort paid off. With a four-GPU setup, VidPress can self-produce 75 percent of videos on Haokan with the most-viewed VidPress video reaching 850,000 views.
On top of the video quality, scalability, and cost savings as a result of using VidPress, Baidu believes AI promises more potential in video synthesis. In the near future, VidPress is expected to customize video synthesis to suit clients’ interests for content and format.
Baidu has also built a massive repository of short videos from Haokan, another Baidu video app Quanmin, and the main Baidu App, of which over 70% of the content distributed are short videos. By leveraging Baidu’s abundant data resources, VidPress can also provide a more objective and in-depth news video to give the audience a better understanding of the story.
In this regard, VidPress can not only meet the demands of content production but also promote the information neutralization, making VidPress a content resource that everyone loves to watch.