Video to Text

Summary

Video to text can be performed by connecting a Video Block with a Text Block. Similar to Image to text, it can be used to extract information from any video inputs.

This combination of blocks can be used to Describe the video; Give a numbered list of frames with detailed description of visual content;

Models

Video to Text Model

Llava, a fine-tuned version of of the Qwen model.

Prompt

"Describe this video in a few sentence"
"Describe the composition and focal points of each frame, on how elements are arranged and how they guide the viewer's attention."
"Illustrate the atmosphere of each scene, focusing on sensory details such as lighting, visceral color sensation, and spatial depth."
"Describe how recurring motifs and visual patterns contribute to thematic development in the video."
"Give me a list of frame in this video and describe each visual composition."

PreviousImage to Text NextImage Block

Last updated 6 months ago