# Xarray for multidimensional gridded data

## Contents

# Xarray for multidimensional gridded data#

In the previous set of lectures, we saw how Pandas provided a way to keep track of additional “metadata” surrounding tabular datasets, including “indexes” for each row and labels for each column. These features, together with Pandas’ many useful routines for all kinds of data munging and analysis, have made Pandas one of the most popular python packages in the world.

However, not all Earth science datasets easily fit into the “tabular” model (i.e. rows and columns) imposed by Pandas. In particular, we often deal with *multidimensional data*. By *multidimensional data* (also often called *N-dimensional*), I mean data with many independent dimensions or axes. For example, we might represent Earth’s surface temperature \(T\) as a three dimensional variable

where \(x\) is longitude, \(y\) is latitude, and \(t\) is time.

The point of xarray is to provide pandas-level convenience for working with this type of data.

## Learning Goals for Xarray#

Because of the importance of xarray for data analysis in geoscience, we are going to spend a long time on it. The goals of this section include the following.

### Lesson 1: Xarray Fundamentals#

#### Dataset Creation#

Describe the core xarray data structures, the

`DataArray`

and the`Dataset`

, and the components that make them up, including: Data Variables, Dimensions, Coordinates, Indexes, and AttributesCreate xarray

`DataArrays`

and`DataSets`

out of raw numpy arraysCreate xarray objects with and without indexes

Load xarray datasets from netCDF files and openDAP servers

View and set attributes

#### Basic Indexing and Interpolation#

Select data by position using

`.isel`

with values or slicesSelect data by label using

`.sel`

with values or slicesSelect timeseries data by date/time with values or slices

Use nearest-neighbor lookups with

`.sel`

Mask data with

`.where`

Interpolate data in one and several dimensions

#### Basic Computation#

Do basic arithmetic with DataArrays and Datasets

Use numpy universal function on DataArrays and Datasets, or use corresponding built-in xarray methods

Combine multiple xarray objects in arithmetic operations and understand how they are broadcasted / aligned

Perform aggregation (reduction) along one or multiple dimensions of a DataArray or Dataset

#### Basic Plotting#

Use built-in xarray plotting for 1D and 2D DataArrays

Customize plots with options

### Lesson 2: Advanced Usage#

#### Xarray’s groupby, resample, and rolling#

Split xarray objects into groups using

`groupby`

Apply reduction operations to groups (e.g. mean)

Apply non-reducing functions to groups (e.g. standardize)

Use

`groupby`

with time coordinates (e.g. to create climatologies)Use artimetic between

`GroupBy`

objects and regular DataArrays / DatasetsUse

`groupby_bins`

to aggregate data in binsUse

`resample`

on time dimensionsUse

`rolling`

to apply rolling aggregations

#### Merging Combining Datasets#

Concatentate DataArrays and Datasets along a new or existing dimension

Merge multiple datasets with different variables

Add a new data variable to an existing Dataset

#### Reshaping Data#

Transpose dimension order

Swap coordinates

Expand and squeeze dimensions

Convert between DataArray and Dataset

Use

`stack`

and`unstack`

to transform data

#### Advanced Computations#

Use

`differentiate`

to take derivatives of dataUse

`apply_ufunc`

to apply custom or specialized operations to data

#### Plotting#

Show multiple line plots over a dimension using the

`hue`

keywordCreate multiple 2D plots using faceting