<-- Back to schedule

Breaking up arrays up into chunks for fun and science with Xarray and Dask

Project: xarray

Xarray is n-dimensional array package bringing numpy and pandas-style interfaces to labelled data. Its main use is for manipulating scientific datasets stored in NetCDF file format.

The Dask package brings task graphs for parallel computation of arrays, by breaking them up into smaller chunks, for lazy processing of arrays. It can handle larger-than-memory dataset, scaling from a single machine to a cluster.

When used together, then can be used to analyse all sorts of scientific data. This talk will look at using them to analyse a timeseries of Earth-observation from Landsat satellites.

Andrew Hicks

Andrew is a developer from Canberra who works on software projects relating to scientific and geospatial data. He has been using Python for analysis of environmental data. Andrew is currently working at Geoscience Australia on an open-source system to let scientists easily access and manipulate Earth-observation data through Python.

Sponsors


Platinum


Gold