2Hunan Provincial Key Laboratory of Optic-Electronic Intelligent Measurement and Control
3Zhejiang University
4School of Engineering, Westlake University
5State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
6National Engineering Research Center of Visual Technology, School of Computer Science, Peking University
†Co-corresponding authors
Abstract
Polarimetric imaging aims to acquire surface polarization characteristics, such as the
Degree of Linear Polarization (DoLP) and the Angle of Polarization (AoP). In mainstream
Division-of-Focal-Plane (DoFP) color polarization imaging, reconstructing polarization
parameters from captured mosaic arrays remains a challenging inverse problem. Existing
DoFP cameras are also limited by hardware bottlenecks and often cannot provide
high-frame-rate acquisition, which restricts the use of polarimetric imaging in dynamic
video tasks. These limitations motivate the joint enhancement of spatial and temporal
resolution.To this end, we propose the first space-time polarization video reconstruction
architecture. The proposed method performs unified spatiotemporal modeling of polarization
directions and uses a polarization-aware implicit neural representation to achieve
continuous and high-fidelity upsampling. By analyzing temporal variations in polarization
parameters, we further introduce a flow-guided polarization variation loss to supervise
polarization dynamics. In addition, we establish the first large-scale color DoFP
polarization video benchmark to support this research direction. Extensive experiments on
the proposed benchmark demonstrate the effectiveness of the proposed method.
Overview
The overall pipeline of the proposed method is shown in the overview figure. Following
standard continuous STVSR frameworks, PolarVSR synthesizes a high-resolution frame at
arbitrary time from a pair of adjacent mosaic arrays. Unlike standard STVSR, the
proposed method jointly models all four polarization directions by concatenating them
channel-wise for unified feature processing, allowing the network to learn intrinsic
cross-direction dependencies under degradation.
The unpolarized intensity is used to derive the inter-frame motion field for accurate
motion estimation. Intensity and polarization representations are sampled by the
polarization-aware implicit neural representation (PAINR), warped to the target time,
refined by the motion-compensated feature refinement (MCFR) block, and decoded to
reconstruct the high-resolution color polarization output.
Benchmark
We introduce PV, the first large-scale color polarization video dataset collected using a
FLIR BFS-U3-51S5PC-C camera. PV covers diverse indoor and outdoor scenes. For indoor
setups, the camera is mounted on a rigid stand, and the objects are placed on a motorized
turntable. The turntable is operated at three speed levels, and the light sources are
adjusted to create different illumination conditions. The objects cover a wide range of
materials, including polarizers, plastics, and frosted surfaces.
For outdoor scenarios, some scenes are recorded using a tripod, such as in a zoo, where
the activities of different animals are captured and variations in fur textures produce
noticeable polarization differences. We also collect driving sequences using an in-vehicle
setup under daytime and nighttime road conditions, which are useful for downstream tasks
such as object recognition. The camera operates at 75 FPS. In total, the dataset consists
of 65 scenes and 117550 frames, with sequence lengths ranging from 200 to 2000 frames.
Overview of representative PV samples across diverse scenes and materials.Indoor subset.Outdoor subset focusing on road and vehicle scenes.Animal subset.Nighttime subset.Car subset.Human subset.Glass subset.Metal subset.Plastic subset.
Results
Video 1 (Space2 Time8)
S0
DoLP-AoP
Video 2 (Space2 Time8)
S0
DoLP-AoP
Video 3 (Space4 Time4)
S0
DoLP-AoP
Quantitative Results
Table 1. Demosaicking (2x) and 2x frame interpolation.
VFI Method
SR Method
PSNRI ↑
PSNRp ↑
SSIMI ↑
SSIMp ↑
MAE ↓
SuperSloMo
ATD
27.322
23.073
0.847
0.657
14.626
SuperSloMo
PIDSR
33.977
32.021
0.939
0.818
11.843
SuperSloMo
PUGDiff
28.039
30.299
0.862
0.781
8.784
VFIT
ATD
28.983
22.390
0.875
0.658
14.693
VFIT
PIDSR
34.052
32.343
0.940
0.829
11.443
VFIT
PUGDiff
30.225
31.043
0.897
0.800
8.907
SCUBA
ATD
28.941
22.585
0.875
0.657
15.106
SCUBA
PIDSR
30.471
31.289
0.894
0.796
12.769
SCUBA
PUGDiff
30.212
30.876
0.896
0.791
9.995
VideoINR
29.388
22.218
0.886
0.657
11.255
VideoINR-12ch
32.423
29.732
0.924
0.771
8.583
MoTIF
29.687
22.519
0.892
0.660
11.415
MoTIF-12ch
32.292
29.859
0.933
0.788
8.558
BF-STVSR
29.472
22.214
0.890
0.656
11.481
BF-STVSR-12ch
32.654
29.763
0.928
0.776
8.437
Ours
34.631
33.310
0.944
0.854
5.922
Table 2. Super-resolution under different spatial-temporal settings. Entries are PSNRI / PSNRp / MAE.