How to extract data from NetCDF to CSV using python?

Dear ECMWF,

I find Copernicus data very important, however the NetCDF files are very difficult to work with. I've been trying different scripts in python and R to extract climate data from NetCDF files into CSV for further analysis, but it seems a very difficult task.

I would like to ask for your help.

I need to extract climate data (precipitation, average temperature, etc.) for European countries using the data provided by Copernicus - "Temperature and precipitation climate impact indicators from 1970 to 2100 derived from European climate projections ". Ideally I would like to have a panel data set, with columns per year, per country, and then separate columns for each climate variable.

I've tried the python script recommended by this page - How to convert NetCDF to CSV

This is the code used:

#this is for reading the .nc in the working folder
import glob
#this is reaquired ti read the netCDF4 data
from netCDF4 import Dataset 
#required to read and write the csv files
import pandas as pd
#required for using the array functions
import numpy as np

from matplotlib.dates import num2date

data = Dataset('prAdjust_tmean.nc')


This is how data looks - contents

print(data)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    CDI: Climate Data Interface version 1.8.2 (http://mpimet.mpg.de/cdi)
    frequency: year
    CDO: Climate Data Operators version 1.8.2 (http://mpimet.mpg.de/cdo)
    creation_date: 2020-02-12T15:00:49ZCET+0100
    Conventions: CF-1.6
    institution_url: www.smhi.se
    invar_platform_id: -
    invar_rcm_model_driver: MPI-M-MPI-ESM-LR
    time_coverage_start: 1971
    time_coverage_end: 2000
    domain: EUR-11
    geospatial_lat_min: 23.942343
    geospatial_lat_max: 72.641624
    geospatial_lat_resolution: 0.04268074 degree
    geospatial_lon_min: -35.034023
    geospatial_lon_max: 73.937675
    geospatial_lon_resolution: 0.009246826 degree
    geospatial_bounds: -
    NCO: netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)
    acknowledgements: This work was performed within Copernicus Climate Change Service - C3S_424_SMHI, https://climate.copernicus.eu/operational-service-water-sector, on behalf of ECMWF and EU.
    contact: Hydro.fou@smhi.se
    keywords: precipitation
    license: Copernicus License V1.2
    output_frequency: 30 year average value
    summary: Calculated as the mean annual values of daily precipitation averaged over a 30 year period.
    comment: The Climate Data Operators (CDO) software was used for the calculation of climate impact indicators (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf,  https://code.mpimet.mpg.de/projects/cdo/embedded/cdo_eca.pdf).
    history: CDO commands (last cdo command first and separated with ;): timmean; yearmean
    invar_bc_institution: Swedish Meteorological and Hydrological Institute
    invar_bc_method: TimescaleBC, Description in deliverable C3S_D424.SMHI.1.3b
    invar_bc_method_id: TimescaleBC v1.02
    invar_bc_observation: EFAS-Meteo, https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/efas-meteo-european-daily-high-resolution-gridded-meteorological-data-set-1990-2011
    invar_bc_observation_id: EFAS-Meteo
    invar_bc_period: 1990-2018
    data_quality: Testing of EURO-CORDEX data performed by ESGF nodes. Additional tests were performed when producing CII and ECVs in C3S_424_SMHI.
    institution: SMHI
    project_id: C3S_424_SMHI
    references: 
    source: The RCM data originate from EURO-CORDEX (Coordinated Downscaling Experiment - European Domain, EUR-11) https://euro-cordex.net/.
    invar_experiment_id: rcp45
    invar_realisation_id: r1i1p1
    invar_rcm_model_id: MPI-CSC-REMO2009-v1
    variable_name: prAdjust_tmean
    dimensions(sizes): x(1000), y(950), time(1), bnds(2)
    variables(dimensions): float32 lon(y,x), float32 lat(y,x), float64 time(time), float64 time_bnds(time,bnds), float32 prAdjust_tmean(time,y,x)
    groups:

After that I extract the needed variable:

t2m = data.variables['prAdjust_tmean']

Get dimensions assuming 3D: time, latitude, longitude

time_dim, lat_dim, lon_dim = t2m.get_dims()
time_var = data.variables[time_dim.name]
times = num2date(time_var[:], time_var.units)
latitudes = data.variables[lat_dim.name][:]
longitudes = data.variables[lon_dim.name][:]
 
output_dir = './'


And the Error:

OverflowError                             Traceback (most recent call last)
<ipython-input-9-69e10e41e621> in <module>
      2 time_dim, lat_dim, lon_dim = t2m.get_dims()
      3 time_var = data.variables[time_dim.name]
----> 4 times = num2date(time_var[:], time_var.units)
      5 latitudes = data.variables[lat_dim.name][:]
      6 longitudes = data.variables[lon_dim.name][:]

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py in num2date(x, tz)
    509     if tz is None:
    510         tz = _get_rc_timezone()
--> 511     return _from_ordinalf_np_vectorized(x, tz).tolist()
    512 
    513 

C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs)
   2106             vargs.extend([kwargs[_n] for _n in names])
   2107 
-> 2108         return self._vectorize_call(func=func, args=vargs)
   2109 
   2110     def _get_ufunc_and_otypes(self, func, args):

C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args)
   2190                       for a in args]
   2191 
-> 2192             outputs = ufunc(*inputs)
   2193 
   2194             if ufunc.nout == 1:

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py in _from_ordinalf(x, tz)
    329 
    330     dt = (np.datetime64(get_epoch()) +
--> 331           np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))
    332     if dt < np.datetime64('0001-01-01') or dt >= np.datetime64('10000-01-01'):
    333         raise ValueError(f'Date ordinal {x} converts to {dt} (using '

OverflowError: int too big to convert






And this is the last part of the script:

import os
 
# Path
path = "/home"
 
# Join various path components
print(os.path.join(path, "User/Desktop", "file.txt"))
 
 
# Path
path = "User/Documents"
 
# Join various path components
print(os.path.join(path, "/home", "file.txt"))

filename = os.path.join(output_dir, 'table.csv')
print(f'Writing data in tabular form to {filename} (this may take some time)...')
times_grid, latitudes_grid, longitudes_grid = [
    x.flatten() for x in np.meshgrid(times, latitudes, longitudes, indexing='ij')]
df = pd.DataFrame({
    'time': [t.isoformat() for t in times_grid],
    'latitude': latitudes_grid,
    'longitude': longitudes_grid,
    't2m': t2m[:].flatten()})
df.to_csv(filename, index=False)

print('Done')

`Once again, I would like to stress that I need to extract from nc file these specific columns: year (time period), longitude, latitude and the climate variables (temperature, precipitations, etc.). I hope you can help me in this task.`

Many thanks for your time and help.

Best regards, Marian

Page tree

After that I extract the needed variable:

`Once again, I would like to stress that I need to extract from nc file these specific columns: year (time period), longitude, latitude and the climate variables (temperature, precipitations, etc.). I hope you can help me in this task.`

2 Comments

Milana Vuckovic

Marian Dobranschi