Julia hackathon v6.0 2024 - Outcome

We held our sixth GPU4GEO Julia hackathon on October 07-11, 2024 in Black Forest (DE), focussing on a wide range of Julia topics. Hereafter a glimpse into the progress made by some participants on various Julia-related projects and some visual impressions.

๐Ÿšง more news to come!

Chmy.jl - Finite differences and staggered grids

You Wu, Ivan Utkin, Ludovic Rรคss

It has been a fruitful week, where we restructured the package structure and we also further furnished the documentation of Chmy.jl, targeting on the distributed usage of it.

In order to allow users to use all submodules with a single using Chmy statement, we refactored the API to export symbols in submodules explicitly as addressed in PR #51. Instead of relying on an external package such as Reexport.jl, we decided to manually export all relevant symbols to avoid unnecessary package dependencies.

With PR #56, we aim to provide a comprehensive yet beginner-friendly documentation to distributed usage of Chmy.jl for our users. To do this, we provide a conceptual introduction to distributed computing generally under the section Distributed. For more experienced users, one can start with a simple script for solving a 2D diffusion example under the section Using Chmy.jl with MPI.

Convection code

Paul Tackley

A Julia spherical annulus convection program. The program solves the 2D spherical annulus variable-viscosity equations as given in Hernlund & Tackley (2008), on a staggered grid using the direct solver. Some anomalous behaviour is observed relative to the test cases reported in that paper, so more testing/debugging is needed. Once perfected it will be posted online for general use.

Annulus convection

Permeability in GeoParams

Pascal Aellig, Jacob Frasunkiewicz

Over the course of the week, we have been discussing and adding Permeability laws to GeoParams.jl. Currently, there are four laws that can now be added and called from the MaterialParams structure. Part one of many has been merged in PR #225, so stay tuned for more over the course of the next few weeks as we implement computational routines to facilitate the writing of two-phase codes.

Implicit solvers with Enzyme.jl

Lorenzo Candioti, Valentin Churavy

We developed a workflow to solve partial differential equations (PDEs) with implicit schemes using the automatic differentiation package Enzyme.jl. Using Enzyme to solve PDEs typically involves spelling out the residual form of the equations and differentiating this function w.r.t. the solution variable. The resulting Vector-Jacobian-Product (VJP, or Jacobian-Vector-Product, JVP) is then used to assemble the sparse Jacobian needed to solve the equations. The newly developed workflow relies on Krylov solvers which only need the JVP (or VJP) as input to solve the system of equations, thus avoiding the computationally expensive part of assembling the full Jacobian. Tested on a simple 1D Diffusion Equation, the new workflow is ca. 1.5x faster compared to the full Jacobian assembly approach.

                        Matrix-free
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
                             Time                    Allocations
                    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 Tot / % measured:       989ms / 100.0%           1.07MiB /  71.3%

Section     ncalls     time    %tot     avg     alloc    %tot      avg
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
iteration        1    989ms  100.0%   989ms    785KiB  100.0%   785KiB
  gmres          9    988ms  100.0%   110ms   78.9KiB   10.1%  8.77KiB
    jvp      43.3k    276ms   28.0%  6.38ฮผs     0.00B    0.0%    0.00B
  forward       10   82.2ฮผs    0.0%  8.22ฮผs     0.00B    0.0%    0.00B
  inc            9   44.4ฮผs    0.0%  4.93ฮผs     0.00B    0.0%    0.00B
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
                        Jacobian assembly
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
                              Time                    Allocations
                     โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Tot / % measured:       1.43s / 100.0%           56.7MiB /  99.2%

Section      ncalls     time    %tot     avg     alloc    %tot      avg
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
iteration         1    1.43s  100.0%   1.43s   56.2MiB  100.0%  56.2MiB
  assembly        9    1.40s   98.2%   156ms   15.0MiB   26.6%  1.66MiB
    jvp       90.0k    625ms   43.8%  6.94ฮผs     0.00B    0.0%    0.00B
  solve           9   24.8ms    1.7%  2.76ms   40.5MiB   72.1%  4.50MiB
  forward        10   73.0ฮผs    0.0%  7.30ฮผs     0.00B    0.0%    0.00B
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Using Enzyme.jl to calculate adjoint sensitivies within JustRelax.jl

Christian Schuler, Valentin Churavy, Albert de Montserrat, Pascal Aellig

ParallelStencil.jl has been made compatible with the latest Enzyme.jl version (PR #169 and PR #170). With the help of Enzyme.jl and ParallelStencil.jl the neccessary vector-Jacobian products (VJP) for the adjoint solve in JustRelax.jl can be calculated. Work has also been done to make the adjoint solve work on multiple GPUs/CPUs. The figure shows a viscoelastic falling block example with adjoint sensitivities w.r.t. to the viscosity and density.

FallingBlock

New Metal backend for ParallelStencil.jl

Giacomo Aloisi, Samuel Omlin

During the week we have implemented a Metal.jl backend for ParallelStencil.jl! ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ This will allow users to exploit their Apple silicon GPUs like the M1, M2 and M3 chips that are available on modern macOS laptops, to use with ParallelStencil for an amazing speedup!

This is the PR #175 with the changes, so stay tuned for a new release soon!