[84] | Gradient smoothness & photometric loss | First unsupervised training with image pairs | Low accuracy, occlusion issues, local optima |
[85] | Left-right consistency | Solve occlusion problem | Texture-copying artifacts |
[86] | Pyramid processing | Mitigate multi-scale object impact | Depth holes in low-scale disparity maps |
[87] | Local plane parameter module | Address local differentiability | Limited dataset usage |
[88] | Dense feature fusion | Smoother depth maps with clearer edges | Limited dataset usage |
[92] | Coupled depth-pose network | First video-based unsupervised framework | Struggle with dynamics and occlusion |
[93] | Multi-scale appearance matching loss | Reduce depth artifacts | Require known camera intrinsics |
[95] | Geometry-aware self-discovered mask | Solve scale inconsistency & occlusion | Lack absolute scale factor |
[96] | Overlapping & blank masks | Improve accuracy | Poor in dynamic scenes |
[97] | VO joint estimation | Enhance dynamic object estimation | Scale ambiguity |
[98] | Depth-pose consistency loss & attention | Reduce scale ambiguity | Complex model |
[100] | 3D geometric constraints | No need for camera parameters | Ignore moving objects |
[101] | Minimal instance photometric residual | Remove dynamic object influence | Depth detail improvement needed |
[102] | Occlusion-aware feature fusion | Integrate spatial context | ‒ |
[104] | Direct VO as pose predictor | Solve scale ambiguity | Poor for deformable objects |
[106] | Competitive collaboration framework | Integrate multiple tasks | Scale inconsistency |
[107] | Global loss constraint | Solve scale inconsistency | ‒ |
[108] | Cross-view completion | Improve generalization | High computation cost |
[109] | Shared feature pyramid | Reduce network parameters | Ignore sequence context |
[110] | ConvLSTM | Utilize sequence context | ‒ |
[111] | 3D packing-unpacking blocks | Preserve fine details | Depend on accurate speed measurements |
[112] | Recurrent spatiotemporal network | Enhance self-motion estimation | High computation cost |
[114] | Joint framework (GeoNet) | Reduce occlusion, blur, and dynamic object effects | Optical flow depends on depth & pose networks |
[115] | Independent optical flow network | Prevent error propagation | Occlusion issues |
[116] | Multi-task joint estimation | Mitigate occlusion | Limited scene flow accuracy |
[117] | Independent motion estimation network | More accurate scene flow | Complex structure |
[118] | Rigid vs. full scene flow | Reduce dynamic object influence | Limited by brightness constancy assumption |
[121] | Semantic-guided depth (SGDepth) | Reduce dynamic object impact | Require known intrinsics |
[123] | Pixel-level semantic priors | Improve depth consistency | Limited receptive field |
[124] | Shared encoder with multi-task feature extraction | Enhance weak-texture estimation | ‒ |
[126] | Neighboring frames as auxiliary input | No need for calibrated videos | Sensitive to noise & local minima |
[127] | Self-cross-attention layers | Improve feature matching | Ignore dynamic objects |
[128] | Dynamic object motion disentanglement | Reduce dynamic object effects | Struggle in low-texture scenes |
[129] | Learnable PatchMatch | Improve performance in low-texture or brightness variations | Underutilize multi-view geometry |
[130] | Mono-depth as geometric prior for multi-view cost volume | Robust under multi-view ambiguity | ‒ |
[132] | GAN-based depth estimation | First GAN-based unsupervised approach | Limited accuracy |
[133] | GAN for depth & pose | Improve accuracy | Fail on dynamic objects |
[134] | GAN with temporal correlation | Solve depth blur & pose inaccuracy | Affected by occlusion & view changes |
[135] | Boolean masking | Mitigate occlusion & view changes | Ineffective in low-light |
[136] | Adversarial domain adaptation | Enhance low-light performance | Struggle with highlights & shadows |
[137] | Domain separation network | Handle illumination changes | Poor for single-day‒night scenes |
[138] | Multi-scale GAN | Improve depth resolution | High computation cost |
[139] | Diffusion-based model | Enhance network stability | ‒ |
[140] | Lightweight model | Significantly reduce computation | ‒ |