Cloud Compute
As mentioned in the introduction, cloud computing is all about moving geospatial analysis and computation from your local machine to a remote machine in the cloud. This approach has several advantages over traditional desktop computing.
Advantages of Cloud Computing
- With cloud computing, you can avoid the upfront cost and complexity of owning and maintaining your own IT infrastructure
- Cloud computing allows groups or individuals to scale up (or down) their operations quickly as their computing needs change
- Cloud computing allows users to access their data and applications from anywhere, on any device, at any time
- Geospatial in the cloud empowers colleagues to work directly together on the same data, models, and applications. The same way that Google Docs allows multiple people to work on the same document at the same time
Google Earth Engine¶
Google Earth Engine is a cloud-based platform that enables users to process and analyze geospatial data. It provides access to a massive collection of satellite imagery, weather data, and other geospatial datasets. Users can use Earth Engine to visualize data, create maps, and run geospatial analysis. This video gives a gentle introduction to GEE and is aimed at non-coders.
Here are some of the key features of GEE:
-
70 pb and 800+ curated geospatial datasets, including 40 years of satellite imagery. Check out Awesome GEE Community Data Catalog for a curated list of community contributed datasets.
-
Cloud-based processing: Earth Engine is a cloud-based platform, so users can process data without having to worry about setting up or maintaining their own infrastructure. This makes it easy to get started with Earth Engine and to scale your analysis as needed.
-
Code can be worked on collaboratively. You can share your code with others, and you can also use Earth Engine to work on code together in real-time.
-
250 GB of free storage for your own data
-
Executing commands occurs in a Javascript code editor, but you can also use the GEE Python library to interact with GEE from your IDE
-
You can create GEE 'apps' which are basically mini web sites to interact with data. Check out this example app showcasing drone imagery.
-
Earth Blox is a non-coding interface for using GEE. This is commercial software, but reduced price for academic use.
Microsoft Planetary Computer¶
The Microsoft Planetary Computer (PC) is a platform that lets users leverage the power of the cloud to accelerate environmental sustainability and Earth science. It is a competitor to Google Earth Engine, but runs on Microsoft Azure cloud infrastructure.
PC has a large Data Catalog. All of the datasets are indexed using the SpatioTemporal Asset Catalog (STAC) standard API.
Using the Planetary Computer HUB, you have a choice of how to interact with the Computer:
-
You can interact with it using Python or R in a Jupyter Notebook environment. When launching a Jupyter Notebook, the environment uses a Pangeo Notebook, which means it is preloaded with a lot geospatial python libraries. You can also access PC through your local version of VS Code
-
You can open a QGIS instance on PC, and load data using the STAC plugin.
- You can now access Planetary Computer Data Catalog from ArcGIS Pro
Python Libraries for Planetary Computer
pip install pystac_client
pip install planetary_computer
import pystac_client
import planetary_computer
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1/",
modifier=planetary_computer.sign_inplace,
)
point = {"type": "Point", "coordinates": [-112.107676, 36.101690]}
search = catalog.search(collections=["nasadem"], intersects=point, limit=1)
item = next(search.get_items())
item
Integrated Development Environments (IDE)¶
IDEs are graphic user interfaces (GUI) for working with code and data.
Modern IDEs are designed to interact with cloud and cloud native data types with few additional extensions.
The most popular software development IDE for working with code is Microsoft's Visual Studio Code ( VS Code). VS Code has extensions for many geospatial applications and can be used to run large compute clusters on cloud.
RStudio and Project Jupyter are two other widely used IDE for working with code and data for research. RStudio focuses mostly on the R programming language, but has backward compatibility with Python.
Project Jupyter is focused on Python but has broad compatibility (kernels) across almost every programming language. Project Jupyter was adopted as the default IDE by several cloud platforms for on-demand virtual machines.
QGIS is the most popular IDE for open source GIS applications and has native features which allow users to load cloud optimized and analysis ready data.
Cloud Workbenches¶
CyVerse Discovery Environment - multi-platform service for full-stack cloud data management
DesignSafe CI - workbench for natural hazards research.
HydroShare - platform for hydrological science applications supported by CUAHSI
OSF.io - is a free, open source web application that connects and supports the research workflow, enabling scientists to increase the efficiency and effectiveness of their research.
RStudio Workbench - is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser.
On-Demand Cloud-based IDE¶
Free services running virtual machines on the cloud allow you to start and work in your browser. These include Google's Collaboratory (CoLab), the MyBinder Project, GitHub's CodeSpaces, and GitPod.
CoLab - Google's CoLaboratory starts a Jupyter Notebook, limited in size but can be increased with subscription.
CodeSpaces - GitHub's CodeSpaces starts a VS Code instance from a GitHub Repository which can be variabily sized (requires subscription)
GitPod - starts a VS Code instance that can be variably sized
MyBinder.org - starts a defined environment (RStudio, Jupyter, VS Code) from a GitHub Repository, limited in size.