Google Veo for Creative Video: The New Era of Automatic Video Production

Google Veo 3 is a state-of-the-art artificial intelligence model that transforms simple textual prompts into stunningly realistic videos. Available through a Google Gemini Ultra subscription, Veo 3 produces cinematic-quality movies, with characters speaking different languages and richly detailed settings. Online experiments have shown amazing results, but have also revealed difficulties in distinguishing between authentic and artificially created videos, which raises concerns related to fake news. While this technology has the potential to make audiovisual production accessible to all, it also raises important questions about the future of the work of filmmakers and creatives, marking the beginning of a new era in the world of digital video.

Share

Tempo di lettura: 3 minuti

“It’s all over.” How many times have we heard this phrase, between the serious and the facetious, whenever artificial intelligence has taken a leap forward? Since the explosion of ChatGPT in November 2022, excitement has been mixed with doubt and concern. And now, with the arrival of Google Veo 3, amazement and anxiety have reached unprecedented levels.

What is Google Veo 3?

Veo 3 is the latest in artificial intelligence, created by Google, and it has the amazing ability to turn simple text into video-just like a written sentence in natural language. And the result? Movies so realistic that they look like something straight out of a movie! Not surprisingly, Google talks about “cinematic video.”

Those who have had the opportunity to test Veo 3 (which is currently available only through an expensive Google Gemini Ultra subscription) have been very impressed. Social media are already flooded with videos generated by this system, and many users have shared authentic works of art, the result of the fusion of human creativity and the power of Google’s algorithms.

The impact on the boundary between true and false

The quality of these videos is so high that even experts in the field struggle to distinguish real footage from artificially created footage. This scenario opens up complicated situations, especially regarding the spread of fake news, a problem already evident with AI-generated images.

Some online users have enjoyed making videos that blur reality, questioning what is real and what is digital. There is already talk of a real “gray zone,” where the distinction between physical objects and digital simulations becomes increasingly blurred.

What is changing for the audiovisual industry?

The evolution of Veo 3 is truly amazing: in just a few years, we have gone from caricature videos (think of the famous AI-created meme of Will Smith eating spaghetti) to results that look incredibly realistic. Not only can this model recreate settings, camera movements, and montages, but it can also make characters speak in different languages and accents, including Italian.

This raises important questions: will we still need directors, actors, technicians, and creators? Or will Veo 3 and its successors fundamentally change the way we produce audiovisual content?

Experiments with Google Veo 3

To really understand the potential of Veo 3, we explored the Web in search of experiments by other users who have created small short films.

Americans in Rome

The first experiment is a surreal short film made by the editorial staff of Corriere della Sera, but verisimilar: an American tourist on vacation in Rome orders two cappuccinos and a “pepperoni” pizza (an American term for a type of pepperoni). After a correction on pronunciation, the waiter brings the food to the table. The tourist then dips a slice of pizza into the cappuccino, a bizarre gastronomic mix that the waiter decides to try in turn.

Five different prompts were used to create as many scenes, relying on Gemini chat and Veo 3 for video generation. Each clip took about three minutes, with the option to regenerate unconvincing scenes.

The video is very realistic as it shows the Colosseum in the background and the Roman pine trees, which are very well detailed. The sound effects included make the noisy city crowded with tourists very believable. The voices of the protagonists are also very well rendered while maintaining the accents of their home countries. However, some errors are present. For example, in some scenes the tourist appears to be sitting unnaturally and subtitles appear at one point. Finally, the scenes are not very consistent with each other.

In fact, to improve continuity between scenes, Google suggests using Flow, the creative suite that allows you to add new scenes while maintaining visual and narrative consistency.

Weather forecast

In another experiment from the Web, they asked to create a fake weather news ad that looked authentic and described a rapidly spreading taco invasion in the United States.

The footage is very realistic in that the presenter appeared believable, with fairly accurate lip synchronization. However, some facial distortions are present.

Talking gorilla

Another example involved a video of a realistic-looking talking gorilla attending a major English soccer match. In the video, the gorilla raised a selfie stick and angrily complained to spectators about an unfair decision by the referee while in the stands with other fans.

The result turned out to be curious, because the gorilla looks and moves incredibly realistically, with natural expressions and body movements. However, some background distortions are still evident.

Conclusion: a future full of opportunities and challenges

Google Veo 3 marks a real change in the world of video production, bringing with it enormous creative potential, but also significant challenges, especially on the ethical and professional front.

The results of experiments so far indicate that this technology could make audiovisual content creation accessible to all, allowing anyone to become a filmmaker with simple text commands. However, the difficulty in distinguishing between what is real and what is generated could complicate the battle against fake news and transform the media landscape.

Only time will tell how this new frontier will develop, but one thing is certain: with Veo 3, we have truly entered a new era of digital video.

More To Explore

Artificial intelligence

Sentiment Analysis & Topic Modeling: What Your Customers Really Mean

You have 200 reviews, 500 support tickets, 1,000 social media comments. Reading them all would take days — and you’d still miss the most important patterns. Sentiment Analysis and Topic Modeling solve exactly this: in ten minutes you get the emotional tone of every text, recurring themes grouped automatically, and a strategic summary that manual reading would never have produced.

Artificial intelligence

Multimodal AI: Analyze PDFs, Images and Documents with Claude, GPT-4 and Gemini

AI no longer reads only text. Claude summarizes a 10-page quote in 30 seconds. GPT-4 Vision transcribes data from a dashboard screenshot into a ready-to-use table. Gemini 1.5 Pro navigates 1,000-page documents citing the sources. This guide shows how they work, when to use which tool, and where the time savings are measurable — with real screenshots from live sessions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Progetta con MongoDB!!!

Acquista il nuovo libro che ti aiuterà a usare correttamente MongoDB per le tue applicazioni. Disponibile ora su Amazon!