Deep Hyperspectral Image Fusion Network with

Iterative Spatio-Spectral Regularization

 

Tao Huang1     Weisheng Dong1*     Leida Li1      Jinjian Wu1      Xin Li2      Guangming Shi1

1School of Artificial Intelligence, Xidian University               2West Virginia University

 

 

Figure 1. Architecture of the proposed network for hyperspectral image fusion. The architecture of (a) the overall network;

 (b) the spatio-spectral regularization module; (c) the 3D filter generator.

 

 

Abstract

Physical acquisition of high-resolution hyperspectral images (HR-HSI) has remained difficult, despite its potential of resolving material-related ambiguities in vision applications. Deep hyperspectral image fusion, aiming at reconstructing an HR-HSI from a pair of low-resolution hyperspectral image (LR-HSI) and high-resolution multispectral image (HR-MSI), has become an appealing computational alternative. Existing fusion methods either rely on hand-crafted image priors or treat fusion as a nonlinear mapping problem, ignoring important physical imaging models. In this paper, we propose a novel regularization strategy to fully exploit the spatio-spectral dependency by a spatially adaptive 3D filter. Moreover, the joint exploitation of spatio-spectral regularization and physical imaging models inspires us to formulate deep hyperspectral image fusion as a differentiable optimization problem. We show how to solve this optimization problem by an end-to-end training of a model-guided unfolding network named DHIF-Net. Unlike existing works of simply concatenating spatial with spectral regularization, our approach aims at an end-to-end optimization of iterative spatio-spectral regularization by multistage network implementations. Our extensive experimental results on both synthetic and real datasets have shown that our DHIF-Net outperforms other competing methods in terms of both objective and subjective visual quality.

 

 

 

Paper


                                                                                TCI  2022

  

Citation

T. Huang, W. Dong, J. Wu, L. Li, X. Li and G. Shi, "Deep Hyperspectral Image Fusion Network With Iterative Spatio-Spectral Regularization," in IEEE Transactions on Computational Imaging, vol. 8, pp. 201-214, 2022, doi: 10.1109/TCI.2022.3152700.

 

Bibtex

@ARTICLE{Huang2022Deep,

               author={Huang, Tao and Dong, Weisheng and Wu, Jinjian and Li, Leida and Li, Xin and Shi, Guangming},

               journal={IEEE Transactions on Computational Imaging},

               title={Deep Hyperspectral Image Fusion Network With Iterative Spatio-Spectral Regularization},

               year={2022},

               volume={8},

               number={},

               pages={201-214},

               doi={10.1109/TCI.2022.3152700}}

 

 

 

 

Download

 

 


                            

                                               Code                                               Dataset (CAVE)                          Dataset (Harvard)                                    WV2 (real data)

 

 

 

 

Results

 Comparison with State-of-the-art Fusion Methods:

Table 1. The average PSNR, SAM, ERGAS, and SSIM results of the test methods on the CAVE dataset and the Harvard

dataset for Gaussian blur kernel and scaling factors 8/16/32.

Factor

8

Dataset

CAVE

Harvard

Method

PSNR

SAM

ERGAS

SSIM

PSNR

SAM

ERGAS

SSIM

CSU [1]

41.23

6.58

1.15

0.982

45.41

3.88

1.39

0.984

HySure [2]

37.04

11.19

1.85

0.960

42.02

4.67

1.79

0.977

CSTF [3]

42.34

6.48

0.98

0.975

42.24

5.16

1.62

0.961

NSSR [4]

44.07

4.40

0.83

0.987

46.08

3.68

1.28

0.984

DHSIS [5]

46.48

3.89

0.66

0.992

46.53

3.57

1.27

0.985

MHF-net [6]

46.46

4.37

0.67

0.992

46.89

3.61

1.27

0.985

DBIN [7]

48.73

3.11

0.55

0.994

47.39

3.47

1.19

0.985

DHIF-Net

49.79

3.01

0.51

0.995

47.55

3.40

1.14

0.986

Factor

16

Dataset

CAVE

Harvard

Method

PSNR

SAM

ERGAS

SSIM

PSNR

SAM

ERGAS

SSIM

CSU [1]

38.73

8.58

0.76

0.970

44.18

4.23

0.77

0.982

HySure [2]

32.17

17.55

1.59

0.908

37.92

5.75

1.28

0.959

CSTF [3]

40.59

7.57

0.59

0.971

42.94

5.41

0.83

0.961

NSSR [4]

39.62

6.46

0.68

0.979

44.44

3.98

0.79

0.982

DHSIS [5]

39.71

6.00

0.74

0.976

45.63

3.88

0.70

0.984

MHF-net [6]

44.56

4.76

0.40

0.990

46.24

3.73

0.64

0.984

DBIN [7]

44.15

4.00

0.43

0.992

46.38

3.61

0.63

0.985

DHIF-Net

46.46

3.66

0.34

0.993

46.61

3.60

0.60

0.985

Factor

32

Dataset

CAVE

Harvard

Method

PSNR

SAM

ERGAS

SSIM

PSNR

SAM

ERGAS

SSIM

CSU [1]

36.52

9.61

0.48

0.959

42.27

4.74

0.43

0.978

HySure [2]

26.62

23.36

1.60

0.822

35.10

8.04

0.82

0.931

CSTF [3]

38.99

9.08

0.34

0.968

41.72

5.39

0.43

0.962

NSSR [4]

36.78

10.46

0.46

0.974

43.08

4.39

0.47

0.982

DHSIS [5]

38.36

8.45

0.45

0.963

44.44

4.29

0.38

0.983

MHF-net [6]

42.17

6.03

0.26

0.986

45.26

4.00

0.34

0.984

DBIN [7]

40.72

4.86

0.33

0.989

44.91

0.94

0.34

0.984

DHIF-Net

42.83

4.89

0.24

0.989

45.55

3.80

0.31

0.985

 

 

Figure 2. Reconstructed images of jelly beans in CAVE dataset at 700nm with Gaussian blur kernel and scaling factor s = 8. The first row shows the reconstructed images, and the second row shows the error images of the competing methods. (a) the RGB image and the reconstructed spectra of the selected patch (indicated by a green box); (b) the ground truth images ; (c) the NSSR method [24] (PSNR = 39.49 dB, SAM = 4.00, ERGAS = 1.10, SSIM = 0.984) ; (d) the DHSIS method [49] (PSNR= 41.98 dB, SAM = 3.88, ERGAS = 0.79, SSIM = 0.987); (e) the MHF-net method [31] (PSNR = 44.04 dB, SAM = 3.58, ERGAS = 0.64, SSIM = 0.992); (f) the DBIN method [10] (PSNR = 43.76 dB, SAM = 3.00, ERGAS = 0.60, SSIM = 0.993); (g) the proposed DHIF-Net method (PSNR = 46.09 dB, SAM = 2.66, ERGAS = 0.52, SSIM = 0.995).

 

 

Figure 3. Reconstructed images of paints in CAVE dataset at 670 nm with noise level σ = 30 and scaling factor s = 8. The first row shows the reconstructed images, and the second row shows the error images of the competing methods. (a) the RGB image and the reconstructed spectra of the selected patch (indicated by a green box); (b) the NSSR method [24] (PSNR = 26.49 dB, SAM = 24.58, ERGAS = 3.24, SSIM = 0.613) ; (c) the MHF-net method [31] (PSNR = 30.66 dB, SAM = 10.90, ERGAS = 1.97, SSIM = 0.892); (d) the DBIN method [10] (PSNR = 30.20 dB, SAM = 8.79, ERGAS = 2.10, SSIM = 0.939); (e) the proposed DHIF-Net (T = 1) method (PSNR = 29.87 dB, SAM = 10.17, ERGAS = 2.18, SSIM = 0.930); (f) the proposed DHIF-Net (T = 4) method (PSNR = 31.20 dB, SAM = 8.19, ERGAS = 1.87, SSIM = 0.943); (g) the ground truth images.

 

 

 

 

References

[1] C. Lanaras, E. Baltsavias, and K. Schindler, “Hyperspectral super-resolution by coupled spectral unmixing,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3586–3594.

[2] M. Simoes, J. Bioucas-Dias, L. B. Almeida, and J. Chanussot, “A convex formulation for hyperspectral image superresolution via subspace-based regularization,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 6, pp. 3373–3388, 2014.

[3] S. Li, R. Dian, L. Fang, and J. M. Bioucas-Dias, “Fusing hyperspectral and multispectral images via coupled sparse tensor factorization,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4118–4130, 2018.

[4] W. Dong, F. Fu, G. Shi, X. Cao, J. Wu, G. Li, and X. Li, “Hyperspectral image super-resolution via non-negative structured sparse representation,” IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2337–2352, 2016.

[5] R. Dian, S. Li, A. Guo, and L. Fang, “Deep hyperspectral image sharpening,” IEEE transactions on neural networks and learning systems, no. 99, pp. 1–11, 2018.

[6] Q. Xie, M. Zhou, Q. Zhao, D. Meng, W. Zuo, and Z. Xu, “Multispectral and hyperspectral image fusion by ms/hs fusion net,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1585–1594.

[7] W. Wang, W. Zeng, Y. Huang, X. Ding, and J. Paisley, “Deep blind hyperspectral image fusion,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 4150–4159.

 

 

 

 

Contact

Tao Huang, Email: thuang_666@stu.xidian.edu.cn

Weisheng Dong, Email: wsdong@mail.xidian.edu.cn

Leida Li, Email: ldli@xidian.edu.cn

Jinjian Wu, Email: jinjian.wu@mail.xidian.edu.cn

Xin Li, Email: xin.li@ieee.org

Guangming Shi, Email: gmshi@xidian.edu.cn