Possible GT vertex / image misalignment in val split

#3
by sunj - opened

Hi ! while training on usm3d/hoho22k_2026_trainval, I noticed that for some validation buildings the wf_vertices don't line up with the images when projected using the cameras that come with the sample.

To project, I used the cameras inside the colmap zip (via pycolmap.Reconstruction(...).images[i].cam_from_world()), since those are the cameras available for every view (the top-level K/R/t are zero when pose_only_in_colmap == True).

Example: building index 77 in the validation split. Red dots are the projected wf_vertices, green dots are the projected colmap.points3D:

bldg077_BAD

The green points cover the building nicely (as expected since they came from these images), but the red vertices sit visibly below / off the actual roof edges — not something that occlusion alone explains.

Doing the same for every validation building and measuring the 3D distance from each wf_vertex to the nearest colmap.points3D.xyz:

overview

For 6 / 170 val buildings the median GT→COLMAP distance is > 1 m:

val index median dist (m) max dist (m)
133 1173.32 1182.02
119 18.26 21.22
100 15.53 27.53
127 2.08 3.93
77 1.41 5.44
149 1.11 2.52

Per-view projection overlays for all 6 are here, with 4 normal ones for comparison:
bldg077_BAD
bldg100_BAD
bldg119_BAD
bldg127_BAD
bldg133_BAD
bldg149_BAD

This one is less than above things, but it's shifted.
bldg006_OK

My guess is wf_vertices might be in the BPO frame while colmap.points3D is in the SfM frame, and for these 6 buildings the two frames aren't close.

A couple of questions:

  1. Is this a known issue, or am I doing something wrong in how I'm getting the cameras?
  2. If wf_vertices really is in a different frame, is there a per-sample transform we're supposed to apply that I missed?
  3. Is the same mismatch present in the train, test(for leaderboard) split?

Thanks!

Urban Scene Modeling Competition CVPR 2026 (Image Track) org

Hi,

Some of the colmap cameras are wrong -- that's could be present in all parts of the dataset (train, val, test).
The poses from the non-colmap (K,R,t) + wf_vertices should be consistent, but some of the colmap reconstructions are not -- either because of the whole reconsturction being wrong, or because the registration failed.
We have done our best to clean those cases as much as possible, but some of the wrong reconstructions remain.
It is up to you, how to process such scenes. Hopefully, there are not too many of them to influence the final results.

--
Best, Dmytro.

Sign up or log in to comment