Publishing House SB RAS:

Publishing House SB RAS:

Address of the Publishing House SB RAS:
Morskoy pr. 2, 630090 Novosibirsk, Russia

Advanced Search

Russian Geology and Geophysics

2025 year, number 2


I.A. Lisenkov1, A.A. Soloviev1,2, V.A. Kuznetsov3, Yu.I. Nikolova1
1Geophysical Center of the Russian Academy of Sciences, Moscow, Russia
2Schmidt Institute of Physics of the Earth of the Russian Academy of Sciences, Moscow, Russia
3National Research Nuclear University MEPhI, Moscow, Russia
Keywords: Geophysics, geology, geospatial data, machine learning, AutoML, GIS, PostgreSQL, Python, Hadoop, Russian Arctic


The article presents a practical approach to the geological and geophysical spatial data collection and preliminary processing to use in machine learning models for geophysical applications. According to the established principles for estimating efforts in data analysis, which are confirmed by the results of surveys among specialists, this stage is viewed as major time and resource-consuming, amounting up to 80% in total volume of data analysis for a hypothesis testing project. The paper focuses on creating a consistent data set that integrates geological and geophysical information on a given region. We consider problems of different sources in the geodata representation to be related to their format (vector/raster), scale, type of attribute information (quantitative/qualitative) and their availability. The algorithm formalization and synthesis for combining geospatial data and converting them into quantitative vectors is a critical aspect. Combining various data draws on the concept of neighborhood fitting in with the data selection techniques and data consolidation strategy. The paper presents the general architecture of the software and hardware complex which includes a module for data collection and transformation in Python using the Pandas library, a data storage system based on the PostgreSQL DBMS (Database Management System) with the PostGIS extension. It is shown that for the considered class of problems in geophysics, it is sufficient to use a relational DBMS for data storing and processing. If the problem dimension increases, it is proposed to use the Big Data technology based on Apache Hadoop for scaling the system. A practical application of the proposed approach is demonstrated as results of data collection for the Caucasus region and eastern sector of the Russian Arctic. Based on the prepared data, experiments were carried out using machine learning models for recognition of locations of potential strong earthquakes and for sensitivity estimation of several geophysical features of these regions. The article presents the experimental results and evaluation of their efficiency.