Compare commits

..

2 commits

2 changed files with 20 additions and 4 deletions

View file

@ -0,0 +1,16 @@
#+title: Migrating Mastodon S3 Providers
#+date: 2024-03-08
As part of some recent work Kakious and I have been doing on the Mastodon instance we run, Floofy.tech, we were looking to cut costs while also improving performance. Part of that was migrating to new hardware using Proxmox rather than ESXi, and that migration was fairly unremarkable - we're still running everything in Kubernetes across multiple VMs, so there isn't really much to note (except we also moved to using Talos over k3s - I will have a longer post about that at some point). Another part of this was moving storage providers for the instance. We were using DigitalOcean Spaces, which served us well, but the pricing left quite a bit on the table. For $5/month, you get 250GiB of storage and 1TB of egress, with $0.02/GiB stored and $0.01GiB for egress after that. Our instance fell comfortably in this bucket, but /very/ comfortably, to the point we would certainly save money going elsewhere. Being employed at Cloudflare, and already having a decent setup for Floofy there, we turned to the R2 offering. With no egress costs and less than 100GiB stored (on average - depends how busy the fediverse is!), we shouldn't be paying anywhere near $5/month, and we're only paying for storage since egress is free and heavily cached.
So! With that decided, it was time to figure out our migration path. The plan was simple - using rclone, setup two remotes on a temporary virtual machine (or LXC container, in this case), and run a bulk transfer overnight. Once complete, we run one more sync then quickly swap the configuration on the Mastodon instance. The window of time between the last sync and Mastodon instances picking up the new configuration should be small enough that we don't miss any new media. Finally, we swap over the DNS to point to our R2 bucket, which should update quickly as the DNS was already proxied through Cloudflare.
Setup of the container was straightforward - we grabbed a Debian 12 image, installed rclone, and then setup two remotes. One pointed to our existing DigitalOcean Spaces bucket (=digitalocean-space=) and the other our new Cloudflare R2 bucket (=cloudflare-r2=). A quick =rclone lsd= on both remotes to confirm connection later, and then a dry run sync or two to verify, we were ready to go. I loaded up tmux, hit go, and waited.
It was going smoothly, until the container decided to drop its internet connection. I'm still not sure what caused this but after running =dhclient= it was fine again. The sync went off without a hitch otherwise.
When I woke up, it was time to run another sync then make the configuration change. I'd already changed the values in our Kubernetes state repository, so it was just a case of pushing it to GitHub and letting ArgoCD sync everything up. First, I reran the =rclone sync= to ensure that anything new was synced over, then quickly pushed the configuration up. It took about a minute to cycle the pods to the new configuration, at which point I removed the DNS record pointing to DigitalOcean and swapped it over to the Cloudflare R2 bucket. Done!
I genuinely expected this to be more difficult, but it really was that easy. This process would work for any rclone-compatible storage provider, of which there are many, so I'd feel pretty comfortable recommending this to others. Depending how busy your instance is, it may be worth doing a final =rclone copy= (which copies new files but doesn't delete from the target) to catch and stragglers after the configuration change, and depending on your DNS setup you may need to modify the time-to-live values ahead of the migration, but we didn't really hit those caveats.
Hopefully this is helpful to others - if you have any questions, feel free to poke me on the federated universe [[https://floofy.tech/@arch][@arch@floofy.tech]].

View file

@ -1,5 +1,5 @@
#+title: Porting to Workers #+title: Porting to Workers
#+date: 2024-01-30 #+date: 2024-01-28
This website is now using Cloudflare Workers! This website is now using Cloudflare Workers!
@ -11,7 +11,9 @@ So first, post storage! With Worker size limits, I decided to go with storing po
After some consideration, I scrapped the idea of generating and storing the result for other Workers on the fly and looked at the Queue option instead. The plan was to pre-render the content and store it somewhere (more on that later) so I can very quickly render content in the background when something is published. When a file is pushed to R2, I can fire of a webhook that queues up the new or changed files for rendering and storing on the edge. It does seem to introduce a little more latency when it comes to publishing content, but in reality it's faster because it doesn't require me to rebuild, push, and restart a container image. After some consideration, I scrapped the idea of generating and storing the result for other Workers on the fly and looked at the Queue option instead. The plan was to pre-render the content and store it somewhere (more on that later) so I can very quickly render content in the background when something is published. When a file is pushed to R2, I can fire of a webhook that queues up the new or changed files for rendering and storing on the edge. It does seem to introduce a little more latency when it comes to publishing content, but in reality it's faster because it doesn't require me to rebuild, push, and restart a container image.
Where to store the rendered content stuck with me for a bit. Initially I wanted to go with KV, since it seemed it would be faster, but I found after some experimentation it was substantially slower since there's no way to easily sort the keys based on content without reading /everything/ into memory and then sorting during Worker execution. Thankfully, I could reach for a real database, and created a D1 instance to hold a single table with the posts. It being SQLite based, I can just use SQL for the queries and take advantage of much more optimised codepaths for sorting or fetching the data I actually need. While replication might be slower than KV, it's far from noticeable. Where to store the rendered content stuck with me for a bit. Initially I wanted to go with KV, since it seemed it would be faster, but I found after some experimentation it was substantially slower since there's no way to easily sort the keys based on content without reading /everything/ into memory and then sorting during Worker execution. Thankfully, I could reach for a real database, and created a D1 instance to hold a single table with the posts. It being SQLite based, I can just use SQL for the queries and take advantage of much more optimised codepaths for sorting or fetching the data I actually need. While D1 doesn't currently replicate, it will be a huge speed boost when it is!
/Note: this section originally said that D1 replicates. I was then told and disovered this is not the case at the moment. Whoops./
The workflow thus far is The workflow thus far is
@ -19,8 +21,6 @@ The workflow thus far is
2. A webhook is sent to a Worker (not by R2) 2. A webhook is sent to a Worker (not by R2)
3. The worker fetches the list of files from R2 and queues them for "indexing" 3. The worker fetches the list of files from R2 and queues them for "indexing"
4. Workers are executed to consume the queue, rendering the files and storing them in D1 4. Workers are executed to consume the queue, rendering the files and storing them in D1
5. D1 is replicated to the edge
6. Workers on the edge now have access to the rendered content at the edge
The final piece is telling the Worker it can cache all the responses in Cloudflare's cache, and we're all set! Each response is cached for 4 hours before a Worker has to be hit to fetch the content from D1 again. The final piece is telling the Worker it can cache all the responses in Cloudflare's cache, and we're all set! Each response is cached for 4 hours before a Worker has to be hit to fetch the content from D1 again.