Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch 3 #540

Open
wants to merge 20 commits into
base: feature/mirror_v1.2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
65c61d0
[doc] add v1.2 blog (#517)
binmakeswell Jun 21, 2024
91ccddc
Force fp16 input to fp32 to avoid nan output in timestep_transform
BurkeHulk Jun 21, 2024
e8ad745
fix broken links in cn report v1
liuwenran Jun 21, 2024
5d1dcf9
Merge pull request #524 from liuwenran/fix_cn_report
zhengzangw Jun 21, 2024
81b5e76
Merge pull request #523 from BurkeHulk/hotfix/fp16_nan_output
zhengzangw Jun 21, 2024
019d3de
[feat] reduce memory leakage in dataloader and pyav
zhengzangw Jun 21, 2024
22496a4
Merge branch 'main' of https://github.com/hpcaitech/Open-Sora-Dev int…
zhengzangw Jun 21, 2024
49d5edd
[fix] support stdit1 training
zhengzangw Jun 21, 2024
30cac7a
Merge branch 'main' of https://github.com/hpcaitech/Open-Sora into de…
zhengzangw Jun 21, 2024
22c4707
[fix] time list
zhengzangw Jun 21, 2024
1b79ec3
minor fix
zhengzangw Jun 21, 2024
878dd99
Merge pull request #526 from hpcaitech/fix/memory_leak
zhengzangw Jun 22, 2024
d74ef76
[docs] update tutorial
zhengzangw Jun 22, 2024
3a09184
Merge pull request #529 from hpcaitech/docs/luchen
zhengzangw Jun 22, 2024
e92672b
handle av error
zhengzangw Jun 22, 2024
8ccd152
Merge pull request #530 from hpcaitech/hotfix/read
zhengzangw Jun 22, 2024
b3f7df8
[fix] better support local ckpt
zhengzangw Jun 22, 2024
a6036e4
Merge pull request #531 from hpcaitech/hotfix/hf-load
zhengzangw Jun 22, 2024
0312a0d
fix SeqParallelMultiHeadCrossAttention for consistent results in dist…
Kipsora Jun 24, 2024
d2d6dd9
Update README.md
guangxiangyang Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@ Open-Sora not only democratizes access to advanced video generation techniques,
streamlined and user-friendly platform that simplifies the complexities of video generation.
With Open-Sora, our goal is to foster innovation, creativity, and inclusivity within the field of content creation.

[[中文文档]](/docs/zh_CN/README.md) [[潞晨云部署视频教程]](https://www.bilibili.com/video/BV141421R7Ag)
[[中文文档](/docs/zh_CN/README.md)] [[潞晨云](https://cloud.luchentech.com/)|[OpenSora镜像](https://cloud.luchentech.com/doc/docs/image/open-sora/)|[视频教程](https://www.bilibili.com/video/BV1ow4m1e7PX/?vd_source=c6b752764cd36ff0e535a768e35d98d2)]

## 📰 News

- **[2024.06.17]** 🔥 We released **Open-Sora 1.2**, which includes **3D-VAE**, **rectified flow**, and **score condition**. The video quality is greatly improved. [[checkpoints]](#open-sora-10-model-weights) [[report]](/docs/report_03.md)
- **[2024.06.17]** 🔥 We released **Open-Sora 1.2**, which includes **3D-VAE**, **rectified flow**, and **score condition**. The video quality is greatly improved. [[checkpoints]](#open-sora-10-model-weights) [[report]](/docs/report_03.md) [[blog]](https://hpc-ai.com/blog/open-sora-from-hpc-ai-tech-team-continues-open-source-generate-any-16-second-720p-hd-video-with-one-click-model-weights-ready-to-use)
- **[2024.04.25]** 🤗 We released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces.
- **[2024.04.25]** We released **Open-Sora 1.1**, which supports **2s~15s, 144p to 720p, any aspect ratio** text-to-image, **text-to-video, image-to-video, video-to-video, infinite time** generation. In addition, a full video processing pipeline is released. [[checkpoints]]() [[report]](/docs/report_02.md)
- **[2024.03.18]** We released **Open-Sora 1.0**, a fully open-source project for video generation.
Expand All @@ -38,16 +38,14 @@ With Open-Sora, our goal is to foster innovation, creativity, and inclusivity wi

## 🎥 Latest Demo

🔥 You can experience Open-Sora on our [🤗 Gradio application on Hugging Face](https://huggingface.co/spaces/hpcai-tech/open-sora). More samples are available in our [Gallery](https://hpcaitech.github.io/Open-Sora/).

🔥 You can experience Open-Sora on our [🤗 Gradio application on Hugging Face](https://huggingface.co/spaces/hpcai-tech/open-sora). More samples and corresponding prompts are available in our [Gallery](https://hpcaitech.github.io/Open-Sora/).

| **4s 720×1280** | **4s 720×1280** | **4s 720×1280** |
| ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| [<img src="assets/demo/v1.2/sample_0013.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/7895aab6-ed23-488c-8486-091480c26327) | [<img src="assets/demo/v1.2/sample_1718.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/20f07c7b-182b-4562-bbee-f1df74c86c9a) | [<img src="assets/demo/v1.2/sample_0087.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/3d897e0d-dc21-453a-b911-b3bda838acc2) |
| [<img src="assets/demo/v1.2/sample_0052.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/644bf938-96ce-44aa-b797-b3c0b513d64c) | [<img src="assets/demo/v1.2/sample_1719.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/272d88ac-4b4a-484d-a665-8d07431671d0) | [<img src="assets/demo/v1.2/sample_0002.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ebbac621-c34e-4bb4-9543-1c34f8989764) |
| [<img src="assets/demo/v1.2/sample_0011.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/a1e3a1a3-4abd-45f5-8df2-6cced69da4ca) | [<img src="assets/demo/v1.2/sample_0004.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/d6ce9c13-28e1-4dff-9644-cc01f5f11926) | [<img src="assets/demo/v1.2/sample_0061.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/561978f8-f1b0-4f4d-ae7b-45bec9001b4a) |


<details>
<summary>OpenSora 1.1 Demo</summary>

Expand Down
58 changes: 58 additions & 0 deletions configs/opensora-v1-2/train/demo_360p.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Dataset settings
dataset = dict(
type="VariableVideoTextDataset",
transform_name="resize_crop",
)

# webvid
bucket_config = {"360p": {102: (1.0, 5)}}
grad_checkpoint = True

# Acceleration settings
num_workers = 8
num_bucket_build_workers = 16
dtype = "bf16"
plugin = "zero2"

# Model settings
model = dict(
type="STDiT3-XL/2",
from_pretrained=None,
qk_norm=True,
enable_flash_attn=True,
enable_layernorm_kernel=True,
freeze_y_embedder=True,
)
vae = dict(
type="OpenSoraVAE_V1_2",
from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
micro_frame_size=17,
micro_batch_size=4,
)
text_encoder = dict(
type="t5",
from_pretrained="DeepFloyd/t5-v1_1-xxl",
model_max_length=300,
shardformer=True,
)
scheduler = dict(
type="rflow",
use_timestep_transform=True,
sample_method="logit-normal",
)

# Log settings
seed = 42
outputs = "outputs"
wandb = False
epochs = 1000
log_every = 10
ckpt_every = 200

# optimization settings
load = None
grad_clip = 1.0
lr = 1e-4
ema_decay = 0.99
adam_eps = 1e-15
warmup_steps = 1000
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
grad_checkpoint = True

# Acceleration settings
num_workers = 0
num_workers = 8
num_bucket_build_workers = 16
dtype = "bf16"
plugin = "zero2"
Expand Down Expand Up @@ -41,21 +41,6 @@
sample_method="logit-normal",
)

# Mask settings
# 25%
mask_ratios = {
"random": 0.01,
"intepolate": 0.002,
"quarter_random": 0.002,
"quarter_head": 0.002,
"quarter_tail": 0.002,
"quarter_head_tail": 0.002,
"image_random": 0.0,
"image_head": 0.22,
"image_tail": 0.005,
"image_head_tail": 0.005,
}

# Log settings
seed = 42
outputs = "outputs"
Expand Down
12 changes: 6 additions & 6 deletions docs/zh_CN/report_v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ OpenAI的Sora在生成一分钟高质量视频方面非常出色。然而,它
如图中所示,在STDiT(ST代表时空)中,我们在每个空间注意力之后立即插入一个时间注意力。这类似于Latte论文中的变种3。然而,我们并没有控制这些变体的相似数量的参数。虽然Latte的论文声称他们的变体比变种3更好,但我们在16x256x256视频上的实验表明,相同数量的迭代次数下,性能排名为:DiT(完整)> STDiT(顺序)> STDiT(并行)≈ Latte。因此,我们出于效率考虑选择了STDiT(顺序)。[这里](/docs/acceleration.md#efficient-stdit)提供了速度基准测试。


![Architecture Comparison](https://i0.imgs.ovh/2024/03/15/eLk9D.png)
![Architecture Comparison](/assets/readme/report_arch_comp.png)

为了专注于视频生成,我们希望基于一个强大的图像生成模型来训练我们的模型。PixArt-α是一个经过高效训练的高质量图像生成模型,具有T5条件化的DiT结构。我们使用[PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha)初始化我们的模型,并将插入的时间注意力的投影层初始化为零。这种初始化在开始时保留了模型的图像生成能力,而Latte的架构则不能。插入的注意力将参数数量从5.8亿增加到7.24亿。

![Architecture](https://i0.imgs.ovh/2024/03/16/erC1d.png)
![Architecture](/assets/readme/report_arch.jpg)

借鉴PixArt-α和Stable Video Diffusion的成功,我们还采用了渐进式训练策略:在366K预训练数据集上进行16x256x256的训练,然后在20K数据集上进行16x256x256、16x512x512和64x512x512的训练。通过扩展位置嵌入,这一策略极大地降低了计算成本。

Expand All @@ -26,7 +26,7 @@ OpenAI的Sora在生成一分钟高质量视频方面非常出色。然而,它

我们发现数据的数量和质量对生成视频的质量有很大的影响,甚至比模型架构和训练策略的影响还要大。目前,我们只从[HD-VG-130M](https://github.com/daooshee/HD-VG-130M)准备了第一批分割(366K个视频片段)。这些视频的质量参差不齐,而且字幕也不够准确。因此,我们进一步从提供免费许可视频的[Pexels](https://www.pexels.com/)收集了20k相对高质量的视频。我们使用LLaVA,一个图像字幕模型,通过三个帧和一个设计好的提示来标记视频。有了设计好的提示,LLaVA能够生成高质量的字幕。

![Caption](https://i0.imgs.ovh/2024/03/16/eXdvC.png)
![Caption](/assets/readme/report_caption.png)

由于我们更加注重数据质量,我们准备收集更多数据,并在下一版本中构建一个视频预处理流程。

Expand All @@ -38,12 +38,12 @@ OpenAI的Sora在生成一分钟高质量视频方面非常出色。然而,它

16x256x256 预训练损失曲线

![16x256x256 Pretraining Loss Curve](https://i0.imgs.ovh/2024/03/16/erXQj.png)
![16x256x256 Pretraining Loss Curve](/assets/readme/report_loss_curve_1.png)

16x256x256 高质量训练损失曲线

![16x256x256 HQ Training Loss Curve](https://i0.imgs.ovh/2024/03/16/ernXv.png)
![16x256x256 HQ Training Loss Curve](/assets/readme/report_loss_curve_2.png)

16x512x512 高质量训练损失曲线

![16x512x512 HQ Training Loss Curve](https://i0.imgs.ovh/2024/03/16/erHBe.png)
![16x512x512 HQ Training Loss Curve](/assets/readme/report_loss_curve_3.png)
2 changes: 1 addition & 1 deletion gradio/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
gaungxiangyang---
title: Open Sora
emoji: 🎥
colorFrom: red
Expand Down
4 changes: 4 additions & 0 deletions opensora/datasets/dataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ def prepare_dataloader(
process_group: Optional[ProcessGroup] = None,
bucket_config=None,
num_bucket_build_workers=1,
prefetch_factor=None,
**kwargs,
):
_kwargs = kwargs.copy()
Expand All @@ -57,6 +58,7 @@ def prepare_dataloader(
pin_memory=pin_memory,
num_workers=num_workers,
collate_fn=collate_fn_default,
prefetch_factor=prefetch_factor,
**_kwargs,
),
batch_sampler,
Expand All @@ -79,6 +81,7 @@ def prepare_dataloader(
pin_memory=pin_memory,
num_workers=num_workers,
collate_fn=collate_fn_default,
prefetch_factor=prefetch_factor,
**_kwargs,
),
sampler,
Expand All @@ -98,6 +101,7 @@ def prepare_dataloader(
pin_memory=pin_memory,
num_workers=num_workers,
collate_fn=collate_fn_batch,
prefetch_factor=prefetch_factor,
**_kwargs,
),
sampler,
Expand Down
4 changes: 3 additions & 1 deletion opensora/datasets/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,11 @@ def getitem(self, index):

# Sampling video frames
video = temporal_random_crop(vframes, num_frames, self.frame_interval)
video = video.clone()
del vframes

video_fps = video_fps // self.frame_interval

# transform
transform = get_transforms_video(self.transform_name, (height, width))
video = transform(video) # T C H W
Expand Down
Loading