Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the dimension of pre_status and cur_status NOT equal in leave_one_out_stage2 #70

Open
husthzy opened this issue Nov 29, 2024 · 7 comments

Comments

@husthzy
Copy link

husthzy commented Nov 29, 2024

I trained with my own dataset.

python leave_one_out_stage2.py -s data/realcap/mario -m output/gs_init/mario_loo -r 8 --sparse_view_num 4 --sh_degree 2 --init_pcd_name dust3r_4 --dust3r_json output/gs_init/mario/refined_cams.json --white_background --random_background --use_dust3r

This is the error info:
`
[ITER 7000] Evaluating train: L1 0.005111759598366916 PSNR 30.6793571472168 [29/11 17:37:02]
Training progress: 100%|██████████████████████████████████████████████| 24000/24000 [04:38<00:00, 86.09it/s, Loss=0.0032718, n=29527]

[ITER 30000] Saving Gaussians [29/11 17:41:22]
key:_xyz [29/11 17:41:22]
pre_c shape:torch.Size([29667, 3]) [29/11 17:41:22]
cur_c shape:torch.Size([29527, 3]) [29/11 17:41:22]
Traceback (most recent call last):
File "/home/code/GaussianObject/leave_one_out_stage2.py", line 328, in
train_3dgs(args, ids)
File "/home/code/GaussianObject/leave_one_out_stage2.py", line 280, in train_3dgs
leave_one_out_training(args,
File "/home/code/GaussianObject/leave_one_out_stage2.py", line 136, in leave_one_out_training
diff = pre_c - cur_c
RuntimeError: The size of tensor a (29667) must match the size of tensor b (29527) at non-singleton dimension 0
`

@7rwang
Copy link

7rwang commented Dec 2, 2024

Fixed? Sometimes I will also encounter the same error

@husthzy
Copy link
Author

husthzy commented Dec 3, 2024

Fixed? Sometimes I will also encounter the same error

Not yet. I am trying to reinstall the env

@husthzy
Copy link
Author

husthzy commented Dec 3, 2024

set iterations = 10_000, it will be ok.
set iterations = 30_000, it will be error.

I have not know the reason.

@7rwang
Copy link

7rwang commented Dec 5, 2024

set iterations = 10_000, it will be ok. set iterations = 30_000, it will be error.

I have not know the reason.

It's weird. Cuz from my perspective, prune is random, it's difficult to keep the number unchanged in two stages

@husthzy
Copy link
Author

husthzy commented Dec 6, 2024

set iterations = 10_000, it will be ok. set iterations = 30_000, it will be error.
I have not know the reason.

It's weird. Cuz from my perspective, prune is random, it's difficult to keep the number unchanged in two stages

Since pruning will occur, why does the code directly subtract two statuses?

I added a check to force the dimensions to be consistent, and the training worked fine, but I can't confirm whether this is correct.

@UranusITS
Copy link

In the leave-one-out Stage 1, we save the 3DGS checkpoint using the checkpoint_iterations argument, which is fixed to 6000.

parser.add_argument("--checkpoint_iterations", nargs="+", type=int, default=[6000])

In Stage 2, we directly load the checkpoint file, and this causes the unexpected issue when changing the number of iterations. Relevant sections:

first_iter = 6000 # in this code, we just use the data from 6000 iter

args.start_checkpoint = os.path.join(args.model_path, 'chkpnt6000.pth') # load this ckpt

Apologies for not fully considering the case when changing the number of iterations.

@chensjtu
Copy link
Owner

exactly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants