深度學習輔助的基於分佈的集成科學資料統計視覺化與分析

黃瀚; Huang, Han

深度學習輔助的基於分佈的集成科學資料統計視覺化與分析

Files

202500046650-108981.pdf (13.86 MB)

Date

2025

Authors

黃瀚

Huang, Han

Abstract

為了透過計算機模擬研究複雜的現實世界現象，科學家通常依賴從多次模擬運行中生成的集合數據集，這些模擬運行使用不同的參數配置。這一過程會生成極大規模的數據集，導致傳統的數據分析流程因有限的I/O帶寬和磁盤容量而變得相當侷限。基於分布的數據表示已被提出作為一個可能的解決方案。通過原位資料處理來生成緊湊的基於分布的表示，不僅緩解了有限的I/O帶寬和磁盤容量的挑戰，還能實現不確定性量化，從而減少誤解的風險。然而，基於分布的方法本質上會犧牲數據樣本的空間信息，可能會降低數據分析流程中的精確度。為了解決這一問題，我們引入了一種深度學習模型來從分布表示中重建數據體積。我們並不使用直接從分布表示預測數據塊的模型，而是提出了一種基於Gumbel-Sinkhorn神經網絡（GSNN）的深度學習模型，它學習將從塊的分布中抽取的樣本映射到塊內的空間位置。該深度學習模型不僅支持高質量的後續數據分析和可視化，還能提供逐點不確定性量化，並保證重建的數據塊分布與其分布表示一致。
To study complex real-world phenomena using computer simulations, scientists often rely on ensemble datasets generated from multiple simulation runs with varying parameter configurations. This process can produce extreme-scale datasets, making traditional data analysis pipelines impractical due to limited I/O bandwidth and disk capacity. Distribution-based data representations have been proposed as a promising solution.Processing data in situ to generate compact distribution-based representations not only alleviates the challenges of limited I/O bandwidth and disk capacity but also enables uncertainty quantification, thus mitigating the risk of misinterpretation. Nevertheless, distribution-based method inherently sacrifices spatial information of data samples within the distribution, potentially reducing precision in the data analysis pipeline. To address this issue, we introduce a deep learning model to reconstruct data volume from the distribution representation. Instead of using a model that predicts a data block directly from its distribution representation, we propose a deep learning model based on the Gumbel-Sinkhorn Neural Network (GSNN) that learns to map samples drawn from a block's distribution to spatial locations within the block. The deep learning model can support high-quality downstream data analysis and visualization, provide point-wise uncertainty quantification, and guarantee the distribution of the reconstructed data block follows the block's distribution representation.

Keywords

深度學習, 基於分布表示, 原位資料處理, 大型集成資料, Deep learning, distribution-based, in situ data processing, large ensemble data

URI

https://etds.lib.ntnu.edu.tw/thesis/detail/7b113f188914eedc1b15d14567d21a63/
http://rportal.lib.ntnu.edu.tw/handle/20.500.12235/125794

Collections

學位論文

Full item page

深度學習輔助的基於分佈的集成科學資料統計視覺化與分析

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By