Final Projects#

When taught at Columbia, the capstone of this course has been a final project. These final projects are archived here. Feel free to open a [pull request] to add your class’s projects to the list.

Project Archive#

Columbia, Fall 2019#

  • Deep earthquakes at Axial Seamount by Michelle Lee github Binder

  • Correlation between sea surface temperature and sea ice from 1990 by Shengtao Wang github Binder

  • An analysis of eddies in the Bay of Bengal by Shannon Bohman github Binder

  • The variation trend and distribution study of CO2 concentration and temperature in North America by Lin Xing github Binder

  • The correlation of global distribution of soil moisture and precipitation by Jiahao Zhang github Binder

  • Temperature Polarization in Different Membrane Distillation Operations by Stephanie McCartney github Binder

  • PTLC vs S&P500 by Thomas Kovar github Binder

  • Are Trees in the South Eastern Brooks Range suffering from Temperature-induced Drought Stress? by Rose Oelkers github Binder

  • Analysis of the risks to submarine cables by Amelie Latreille github Binder

  • Recent Trends in Global Daily Temperature Variability by Casey Ivanovich github Binder

  • Aerosol Change Effect on China’s Precipitation by Patrick Cho github Binder

  • Could CSIF better reveal drought mechanisms? by Weiwei Zhan github Binder

  • Seasonality in Salinity, Temperature, and Currents of the North Atlantic by Joohee Kim github Binder

  • Analyzing the total emissions and the spatial distributions of the different greenhouse gases in the U.S. by Haokai Zhao github Binder

  • Sediment provenance variations in Southern Africa during the last 150 kyr by Chiza Mwinde github Binder

  • The affect of geopotential height anomalies on climate extremes by Patric Ryser github Binder

  • Local Effects of Farmland Irrigation on Groundwater Availability by Matt Harrington github Binder

  • Greenhouse gas ecosystem fluxes in the YK Delta of Alaska by Sarah Ludwig github Binder

  • Quantifying dynamic topography change in Argentina based on model results by Andrew github Binder

  • Implied Uncertainty in Impacts Projections from Internal Model Variability by Kevin Schwarzwald github Binder

  • Effect of Wind Extremes on Aboveground Biomass: Case Study of Hurricanes Maria and Irma by Jashvina Devadoss github Binder

Final Project Requirements#

For instructors wishing to re-create the final project experience, below we include some specification used in past iterations of the course.

Learning Goals#

The goal of the final project is to assess your ability to combine and apply the skills you have learned in class in the context of a real-world research problem. Our class has mostly focused on tools for data analysis and visualization, so this must be the focus of your final project. Specifically, we seek to assess your ability to do the following tasks:

  • Discover and download real datasets in standard formats (e.g. CSV, netCDF)

  • Load the data into Pandas or Xarray, performing any necessary data cleanup (dealing with missing values, proper time encoding, etc.) along the way.

  • Perform realistic scientific calculation involving, for example tasks such as grouping, aggregating, and applying mathematical formulas.

  • Visualize your results in well-formatted plots.

Dataset Requirements#

Your datasets can involve data collected by yourself or your lab, or can come from a public data repository. Ideally, your choice of dataset should be driven by your research. It is acceptable (and encouraged) to have your final project for this class involve an ongoing research project. However, it is not acceptable to have your final project overlap with the final project for another class.

If you don’t know what dataset to use, here are a couple of links to get you started:

You may use just one dataset, or you may choose to combine multiple datasets, depending on your scientific question.

Analysis Requirements#

The goal here is the same as with any science project: to use the data to investigate a scientific question or hypothesis. In order to succeed on the project, you will have to draw on your experience outside our class, from your science-focused classes or independent research, in order to define a scientifically interesting question. It is also acceptable to use this project to reproduce the results from a published study that you find interesting, provided you have access to the original data.

Whatever you choose, you should clearly define a hypothesis or scientific question that you aim to investigate with your analysis. This will determine what you have to do.

The results of this analysis will be figures. Beautiful figures which clearly provide answers to your question / hypothesis. Your notebook should contain at least 4 and no more than 8 figures. If you have closer to 4, they should be complex, multi-panel figures. All figures must have titles, clearly labeled axes, informative colormaps / colorbars, and legends, where appropriate.

Technical Requirements#

Your final project must meet the following technical requirements

  • single jupyter notebook

  • Stored in a standalone public github repo

  • All data is either stored in the repo itself or downloaded / accessed from within the notebook (no manual download steps)

  • Complete explanatory text / equations included in the notebook as markdown cells

  • Notebook must execute in sequence with no errors

  • The whole github repo must be configured to run on mybinder.org or, for analysis involving dask, binder.pangeo.io

You must use either Pandas or Xarray (or both) in some part of your project. You may use other scientific python libraries as well, if you wish, to facilitate some analysis that is not possible with Xarray / Pandas alone. Some libraries you may wish to consider are:

  • SciPy for interpolation, signal processing, spectral analysis, linear algebra, and other general purpose scientific computing routines

  • Statsmodels for advanced statistical analysis

  • Scikit-image for image processing

  • Scikit-learn for machine learning

  • XGCM for working with finite-volume data from general circulation models

  • XESMF for regridding of gridded data

  • Pyresample for resampling (reprojection) of satellite data

  • EOFS for empirical orthogonal function analysis

  • windspharm for spherical harmonic analysis of global atmospheric wind data

  • metpy a collection of tools in Python for reading, visualizing, and performing calculations with weather data

Project Approval#

You must have your dataset(s) and general scope for your project improved by the instructor. The approval process works like this:

  • Create a new public github repo for your project

  • Add a README.md file which contains the scientific question / hypothesis you plan to investigate, links to the relevant datasets, and a three sentence summary of the analysis you plan to do.

  • Submit a link to your project repo using the method indicated by the instructor.

In-Class Presentations#

You are asked to give a 5-minute presentation about your project. Do not prepare any slides. Instead, make your presentation by opening your notebook from GitHub on the presentation computer and walking us through parts of it. Your presentation should be concise and cover the following topics:

  • What data did you analyze and how did you load it?

  • What is the most interesting figure you made? (Show us the figure.)

  • What was the biggest challenge you faced in completing your project.

Don’t forget to make your project repo public; otherwise we won’t be able to see it.