Big Picture(s)

Last month, I added some photos to this blog. They’re a bit on the older side, but I previously only had them up on a Facebook page, and having finally made the effort to eliminate my Facebook presence, I wanted to migrate to infrastructure I own.

There’s only six sets of photos, with ~24 per reel. 128 total photos shouldn’t be too bad, right? My first thought was to just store the assets directly in the repo. I copied over the assets, set up a Git LFS rule to match, threw together a UI using the Astro.glob() API + @astro/image, and deployed — build failure, even though it had built fine on my local. What? Digging deeper into GitLab Pages documentation, it seems the default max size for a site is 100MB. A quick du showed my site to be over 1GB. I raised the limit in the GitLab admin UI (as well as, after some trial & error, the limit for GitLab’s CI/CD artifacts), and my photos were live.

For reference, the setting for the Pages limit is under Settings > Preferences > Pages > Maximum size of pages (MB), and the setting for the CI/CD artifacts limit is under Settings > CI/CD > Continuous Integration and Deployment > Maximum artifacts size (MB).

S3

Great! …but. Being that the site weighed over 1GB on disk, it would continue to weigh that much every time I deployed. Writing a post, making a style update — another 1GB for GitLab to store.

As an added detriment, the site took FOREVER to build. On my local machine, it was tolerable, but GitLab’s runner gets hard CPU and memory limits, so generating 4(-ish) optimized versions of each asset was time consuming for the CI/CD pipeline.

I fixed the size issue first, as the solution seemed simple. GitLab’s storage is already backed by an RGW cluster (self-hosted S3-compatible object storage), so I just tossed the assets in a new bucket, exposed them as world-readable with an S3 policy, and created a Traefik IngressRoute at media.blog.ezracelli.dev.

Of course, I ran into a small road bump; @astro/image needs to know each image’s dimensions, either via its width + height or one of the two dimensions + its aspect ratio. This is used to generate width and height HTML attributes on the <img> tags. When the images are stored locally on disk, it has a quality-of-life mechanism to determine these directly from the file; such a feature unfortunately doesn’t exist for “remote” images. This was also relatively simple to fix — I already had to store some metadata about each image (its title and URL), so I just added the image’s aspect ratio to that JSON.

SSR in `astro`

This worked just fine, but the build time issue still persisted (the size of the built site still wasn’t really where I wanted, either, but I was willing to live with that). Here’s what I was aiming for:

A given asset is available on media.blog.ezracelli.dev at /xxx.jpg.
A request comes in for /xxx.jpg that also specifies some resizing dimensions, maybe via query or path parameters (for example, /xxx.jpg?w=1200)
Generate a reproducible filename (maybe a hash of the path + relevant parameters?)
If that filename exists in the bucket, serve it! If not, generate it on the fly, and store it back before serving it.

This sounds all well-and-good, and I’m perfectly comfortable building a tool that would be able to do this (possibly also using sharp, which is what @astro/image uses under the hood). I really liked the idea of structuring this as a serverless function.

astro supports SSR (read: serverless) routes, but it quickly became clear that this wasn’t viable. To use SSR routes, astro requires setting the entire site as SSR then marking individual pages as pre-renderable. Its generated output (at least with the @astro/node adapter), looks like this:

public/
└─ dist/
   ├─ client/
   │  ├─ index.html
   │  ├─ favicon.ico
   │  ├─ robots.txt
   │  └─ (other assets)
   └─ server/
      ├─ entry.js
      └─ (other assets)

If all your pages are marked as pre-renderable, content in public/dist/client is exactly the same as what’s in the public directory — without images. As it turns out, @astro/image is smart enough to realize that if you’re going to have SSR, you’ll likely want to not spend time pre-rendering all your images at build time (exactly what I want!), but it doesn’t do any caching in this mode, so each asset must be regenerated for each request. This site doesn’t get many views, but that still seems… bad.

A brief spark of hope quickly extinguished (I didn’t really want to use @astro/image anyways, so I wasn’t too torn up about it), I continued with my proof-of-concept. I set up a route at /assets/images/[filename].[ext].ts and wrote out the logic I’d developed before. Halfway though the process, though, I got to thinking about how I would deploy the public/dist/server code — GitLab Pages unfortunately doesn’t have a serverless offering. (As amazing as that would be, the only aim is to offer feature parity with GitHub Pages.)

Serverless

Setting the astro code aside, I went down a deep, deep rabbit hole of self-hosted serverless offerings. I won’t bore you, but each product I considered (Kubeless, OpenFAAS, Fission, OpenWhisk, and Lagon) was duly disqualified for one reason or another. The process was valuable, though — I discovered that most serverless frameworks are just a fancy way to build and deploy a Docker image.

So, I built a Docker image. In the process, I found that yarn@berry makes writing a thin Dockerfile VERY difficult. I’ll write more on that process soon. It got too lengthy to include here, and isn’t strictly relevant (I think it deserves its own post anyways.) In any case, I wanted something easy, and maintaining this Dockerfile across astro and yarn updates wasn’t a responsibility I really wanted for something as simple as rendering images.

For posterity, the plan was going to be:

GitLab Pages still hosts static files (the public/dist/client directory)
An IngressRoute would route traffic seeking known API endpoints into the Pod
public/dist/server/entry.js would handle these requests, routing to the appropriate code

`imgproxy` + `varnish`

I took a step back. All I really wanted was a service that would resize images on demand, same as I was used to with CDNs I’ve worked with in the past (Shopify, Sanity, and Contentful, to name a few). I did a bit of research and stumbled on both thumbor and imgproxy. I decided to go with imgproxy, as S3 support in thumbor seemed to be provided by a third-party, and I didn’t want to go writing a custom Dockerfile if I didn’t have to.

It took a bit of digging in imgproxy’s documentation to find the options I needed to set (it’s not laid out very well, but at least there’s a search!) and quickly discovered it doesn’t have any cache mechanism built in. Not great, but I’ve used varnish before, so it seemed like a simple fix — and indeed it was, with a 5-line .vcl file and a couple of environment variables. (imgproxy also only needed a few environment variables to get running the way I wanted.)

Here’s the final traffic flow:

diagram of traffic flow: through traefik, then though varnish, then (if necessary) though imgproxy

The only thing left to do was hammer out some code to handle to generate (and sign) the imgproxy URLs, which I implemented as an astro plugin. It would be amazing if imgproxy offered an SDK for this purpose, like most major CDNs do. Mine isn’t complete enough to be considered as such, as it only handles a small fraction of the image transformation options imgproxy supports.

I’m super happy with the solution I ended up with, and I had an absolute blast working through this surprisingly monstrous puzzle.

Big Picture(s)

S3

SSR in astro

Serverless

imgproxy + varnish

SSR in `astro`

`imgproxy` + `varnish`