MVGaussian

MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

Under submission

Anonymous authors

Code

Abstract

The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem—multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splatting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results.

Architecture overview

Overview of our MVGaussian framework: Our approach begins with the random initialization of Gaussians within a unit sphere, refined iteratively using an SDS-based optimization strategy. Gaussians are optimized near the true surface, moving toward the pseudo surface while pruning those farther away. Each iteration renders four views with random azimuth angles, encoded into the latent space. Gaussian noise is added and denoised using a UNET model to compute the loss \(\mathcal{L}_{sds}\). The optimization gradient \(\nabla \mathcal{L}_{sds}\) updates the Gaussians, incorporating a feedback loop with fused point cloud data and voxel downsampling to enhance accuracy.

Generated 3D assets from textual prompts

GAUSSIAN DREAMER

LUCID DREAMER

MVGAUSSIAN (OURS)

"An armored green-skin orc warrior riding a vicious hog."

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"A forbidden castle high up in the mountains."

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"A flying dragon, highly detailed, realistic, majestic."

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"A 3D model of an adorable cottage with a thatched roof"

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"A blue jay sitting on a willow basket of macarons"

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"Medieval soldier with shield and sword, fantasy, game, character, highly detailed, photorealistic, 4K, HD"

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"Jack Sparrow wearing sunglasses, head, photorealistic, 8k, HD, raw."

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

"A peacock standing on a surfing board, highly detailed, majestic."

GAUSSIAN DREAMER

GSGEN

LUCID DREAMER

GCS

MVGAUSSIAN

Additional results from MVGaussian

A DSLR photo of the Imperial State Crown of England, highly detailed, realistic, majestic, HD

A DSLR photo of a Schnauzer wearing a pirate hat, highly detailed, realistic, majestic.

Airplane, fighter, steampunk style, ultra realistic, 4k, HD

An opulent couch from the palace of Versailles

Gandalf smiling, white hair, head, photorealistic, 8K, HD.

Joker wearing top hat, head, photorealistic, Fujifilm XT5, 8K, HD, raw.

A portrait of Hatsune Miku as a robot, head, anime, super detailed, best quality, 8K, HD

Michelangelo style statue of a dog reading news on a cellphone.

A furry cat wearing armor, high resolution, highly detailed, photorealistic, nice, 8K, HD

Flamethrower, with fire, scifi, cyberpunk, photorealistic, 8K, HD

A spanish galleon sailing on the open sea

A furry corgi.

A cute fluffy dog, 4K, HD, raw

A quill and ink sitting on a desk

A wolly mammoth

A bichon frise wearing academic regalia, 8K, HD, raw.

Mesh reconstruction

Since the meshes are textureless, they might look semi-transparent and incomplete. This is the common issue with the html viewers and is not caused by the meshes themselves.

"An armored green-skin orc warrior riding a vicious hog."

"A forbidden castle high up in the mountains."

"A flying dragon, highly detailed, realistic, majestic."

"A 3D model of an adorable cottage with a thatched roof"

"A blue jay sitting on a willow basket of macarons"

"Medieval soldier with shield and sword, fantasy, game, character, highly detailed, photorealistic, 4K, HD"

"Jack Sparrow wearing sunglasses, head, photorealistic, 8k, HD, raw."

"A peacock standing on a surfing board, highly detailed, majestic."