Header logo.
small hallucinations
homeyearstagsaboutrss

TIL 251004

I've been reading “Designing Data-Intensive Applications”.

Some interesting things I’ve learned so far:

Human error accounts for the vast majority of outages. To quote the book:

...one study of large internet services found that configuration errors by operators were the leading cause of outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages.

Hardware failures happen more often than I'd expected. Each piece of hardware is eventually going to fail. Two useful metrics are “mean time to failure” (if you throw it away when it fails) and “mean time between failures” (if you repair it when it fails). The values of these metrics aren’t infinite. With so many CPUs, RAM modules, GPUs, and hard drives, something will be failing all the time.

One example given in the book goes like this:

Hard disks are reported as having a mean time to failure (MTTF) of about 10 to 50 years. Thus, on a storage cluster with 10,000 disks, we should expect on average one disk to die per day.

In addition to the above, since modern cloud services prioritize flexibility and elasticity over the stability of any single machine, you need to anticipate these factors when designing your software.

One of the techniques mentioned in the book for handling faults is process isolation.

In ancient times, when software ran closer to the bare metal, this concept meant one CPU process should not touch memory addresses or other resources used by another process. In our modern-day context, this concept extends to technologies like containerization.

What I learned building a scraper and RSS generator

I promised a friend I'd build a tool to monitor changes on a website and convert the updated articles to an RSS feed.

At first I tried using Django, which boasts “batteries included”. There is a Django library that handles scheduled tasks. I forgot the name—it was such a long time ago after all. What I remember is that it caused circular library dependencies and required setting up migrations, since it managed tasks and their run records in the database.

Months passed before I attempted the project again.

This time I used Go. I had built small projects in Go prior to this. I could unapologetically say “I know Go,” because who doesn’t, with its syntax being so transparent?

Yet it still took me a long time to finish the project.

There were conflicting incentives. On top of building the project, I wanted to learn new things. And it’s fair (even good) to learn new things along the way. I read about HTMX then opted for Alpine.js after comparing their respective syntaxes. At this point I didn’t want to build too much of a UI. Both promised interactivity with minimal scripting in HTML pages. Yet after some struggling with templating in Go, I missed JSX. I also found it difficult to wrap my head around embedding data into an HTML element using a custom attribute.

Then there was mission creep. When I set out to work on this project, the initial goal was to monitor one section on one website. Then I asked myself, wouldn’t it be more useful if I allowed people to add websites to track?

In the end product, you can add websites and sections. The app monitors website changes, scrapes pages whose URLs match a pattern, and extracts the title, author, publication date, and content using CSS selectors. All the updates are displayed in the RSS feed view.

Then I thought, who has time to read all this word soup? So I decided to add an API call to ask OpenAI to summarize the full text for me. Now, with these added features, I moved the UI from Go templates and Alpine.js to a full-blown React project.

GitHub Copilot helped a lot during development. One shift in my mindset especially helped me accelerate the development process.

At the beginning, the questions I asked LLMs were “how do I do this?” Upon getting a response, I'd read it carefully, trying to understand the suggested approach and the reasoning behind it.

While good for learning, this significantly slowed me down. As coding agents became more capable, I soon slipped into asking “Do that for me.” Then the whole process became much faster and more pleasant.

I have a habit of taking notes and creating Anki cards. I thought conversations with LLMs were a good source of knowledge. In the end I realized most of these conversations are transient, scenario-specific, and not worth memorizing.

I’m sure there is a lot of background knowledge behind how each function is called and how each code block is structured, and such knowledge is useful for someone like me who’s relatively new to Go.

But there's a cadence to learning and building. To use a painting analogy, laying out the perspective and applying colors are both important. “How do I do it?” questions are the latter. When you let coding agents solve these problems, you can focus on the perspective part, which is more relevant to the structure of the whole picture.

In a real problem I faced, “How do I handle a nil value when I parse a row of SQL query results?” is about a detail. The fact that you need to handle the nil value is more about the whole. As long as you know you need to handle that, I figure it's fine to delegate the details to coding agents.

Fixing a 503 error caused by health probe

I was working with a container app deployed on Azure recently. This container app provides a REST endpoint that allows users to upload files for processing.

A few days ago, uploading larger files started failing repeatedly. These files weren't particularly large either. One that constantly failed to upload was only 4 MB.

I tried uploading this file both via curl and through the web UI. Both attempts failed with a 503 status code. That ruled out a CORS issue, which would have resulted in a different status code and would not have caused curl to fail.

Interestingly, we didn't find these POST requests in the logs. This suggested the requests never reached the container app.

By inspecting the configation of this and other container apps deployed on Azure, I noticed a health probe setting for this app.

It turned out Azure was checking whether the service was alive every 10 seconds. While the app became temporarily unresponsive during the upload and processing, the health probe likely timed out.

Azure would have interpreted this as a sign that the container was down, and either removed it from the load balancer or tried to restarted it. Either way, the request was abruptly terminated, resulting in the 503 error.

July movies

I saw these movies in July:

Shortcomings (2023) five stars. Randall Park is humorous and has done a great job in this movie. Adrian Tomine, the creator of the original cartoon, sat in the theater when Ben announced the closing of it. This is a small personal reassurance that “it's OK.”

She Came to Me (2023) four stars.

War of the Worlds (2025) four stars. Actually maybe five, because of its innovative storytelling by showing everything on a computer screen. Searching (2018) has tried this idea earlier. But telling a story about alien invasion on the screens of computers and smartphones is way more challenging. Quite a few technical details are clearly off. It's impossible to program a malware that quickly. It's also weird to try to upload a Windows executable to a “data center” using a flash drive. But these are forgivable if we can accept a carbon-and-silicon-based hybrid lifeform with badly designed long legs.

Sell something useless

Apparently Labubu, the cute plushie with a wicked smile, has become a thing.

Wang Ning, the founder of Pop Mart, believes in selling what is “useless”. One thought experiment he uses to illustrate this idea goes like this:

Would we sell as many Molly toys if we added a USB flash drive to them?

That would certainly give the toys some kind of “use”. But that also reminds potential buyers that they do not actually need that “use”. Who really needs another USB thingy after all?

A similar idea is being discussed in Japan’s retail industry, where businesses are said to be transitioning from selling “mono” (もの) to selling “koto” (こと). Both words translate to “things” in English. The distinction lies in that “mono” means a tangible object. While “koto” means something intangible, for example, an event or an experience.

Fixing a `form-data` boundary error

I'm starting to use err tag on this blog to document these small things.

I was trying out an endpoint that takes a file field. The code looked something like this.

 1import requests
 2
 3file = {'document': open('document.pdf', 'rb')}
 4
 5headers = {
 6    'Content-Type': 'multipart/form-data',
 7    'Accept': 'application/json'
 8}
 9
10response = requests.post(url,
11    files=file,
12    headers=headers)

Upon sending this request, I was greeted with a "boundary error".

The reason why this is happening is requests will try to write the Content-Type and boundary strings in the post request. If you manually set Content-Type, the boundary strings will be missing.

So I wrote a new static-site generator

I used to generate this blog from Markdown files using my own static site generator written in Python. The resulting HTML files are then hosted on Cloudflare Pages.

At some point late last year, the Python codebase stopped running smoothly because of dependency issues. And I don't remember tweaking my system or Python version, or adding or removing these libraries.

At first, the Markdown parser I used went missing. I had to install it again in a new virtual environment. Then the feed generator complained that an argument was missing in a function call. (These kinds of minor problems will just crop up.)

So I decided to rewrite the SSG behind this blog in Go.

The whole process was pleasant. GitHub Copilot was very helpful. It suggested which dependencies I could use to parse Markdown and to generate RSS feeds (goldmark and gorilla/feeds). It suggested code completion that was often helpful. A few suggested solutions used deprecated function calls but were educational nonetheless.

When you program, old habits from other languages are carried over. Here are a few things I learned while being nudged by Copilot:

  1. Go doesn't support string interpolation with variable names. And there is a whole discussion about it.
  2. Go doesn't support optional args in the function signature. Instead, you can use a variadic function or use an options struct.
  3. I tried to import a struct from the main package in a child package. And that was a dumb idea. Because “importing the main package directly can lead to circular dependencies, which are not allowed in Go.” Instead, I could “move the BlogSetting type to a separate package, which can then be imported by both the main and template packages.”