Home
/ blog

Favvy Icons

What is a favicon?

I must confess. There is this pet peeve I have. Whenever I come across a project, website, or app which sports the default favicon (favorite icon) of whatever framework, tool, or library used to make said project, website, or app, I cannot help but be disappointed. Nay, more than disappointed, the emotion comes close to that of disgust.

It's such a small, almost meaningless detail. A tiny little image, often invisible on mobile browsers. It shows up in one, maybe two places. Why does it even matter?

Well, it matters to me. It matters rather much in my estimation. It is one of those details that you'd think the brain doesn't notice but it does. It shows a care for a product. A focus. Small things matter.

“Wait sir, not everyone is a graphic designer, is it really worth so much time for a tiny detail? Surely, SURELY the functionality is of a higher priority?” you say.

In a world of AI it is especially obnoxious to me because it takes so little effort to say, “make sure to include an svg favicon of a walrus or some shit” in whatever prompt, agent, or chat people use these days. Really, it is 30s of effort, 15 if you copy what I just wrote.

All this souring got me thinking though. How well do I, or you, REALLY know favicons? If they can't even be recognized, do they really matter?

Who's that favicon?

Ok, this is the game, pick 4 websites from the top 1,000,000, show one of said websites favicons, and then guess what domain it belongs to. Bonus if you can filter by category and rank.

I wanted to make this game for a while but figured now was as good as time as any to do so.

The technical stack here will be basic

This is known as the Squeehstack and can be easily served by your neighborhood vps provider. In this case I will reuse a $6 docklet.

Next, the data.

No really, who is that favicon?

Getting the top websites can be done in many a way. To save time, I used the crux list from chrome usage. A big'ol csv file with domain name and a rank bucket.

I did a quick eye balling of the domains and saw some patterns which would make for a poor game experience.

First, subdomains. They add duplicates. So I removed everything other than www and the list dropped down to about 560k domains. Wow.

The next problem is that everyone knows the internet has always and will always serve a singular purpose, porn. Or so I thought. (more on this later)

I wasn't sure how best to handle the porn problem. It isn't so easy to remove it all. Porn websites are pretty clever in the naming. Femfatal.com could be fashion website, a modeling site, or something less work friendly. The last thing I want is for pixelated lewd favicon showing up unexpectedly on a computer as amusing as that may be.

I figured now would be the time to bring out the big guns after my fancy regexfu failed. And by big guns I mean “AI”.

Please, just tell me what the favicon is?

I also want to categorize the websites to make it more fun. Having the ability to play a round where only the top 500 news websites would add variety. The bonus is that I can have a NSFW bucket and just shove all the bad stuff in there. Slick.

I decided to use Gemini and like a good engineer I thought function calling would be best. I wanted to batch many domains at a time since Gemini has a limit of 10k daily request. After all, 10k is far less than 560k and I don't want to wait 56 days. I am told that function calling is super good now so I had high expectations.

It wasn't, Gemini failed to work well with it all. Sometimes I'd get an error which basically meant the AI would spit out incorrect params for the function. Now this function call is not complicated, an array of categories. It would get it maybe 50% of the time. I even added a retry loop which barely helped and blew through even more precious api calls.

I ended up using the good'ol one shot prompt with a “pretty pretty please return a comma separated list of categories.” strategy which worked much better but not perfectly. The larger the batch size the more likely it would fail by returning not enough or too many categories. With a batch size of 50 I can categorize all the domains in a single days worth of api requests.

I manually checked a few batches and Gemini did a decent job setting categories, not perfect but decent, so I let it rip. About 20 minutes later I had a nice little SQlite db full of the results.

I wrote a file of all the batches which failed either from the number of retries never passing validation (no invalid categories, right count, etc) or from triggering safety violation filters. There were over 100k that failed, mostly in the bottom set of domains. This is a pretty large number, almost 25% of the dataset.

You see, Gemini is made by Google which is a pretty woke company. This means they have a strong “no bad bad” filter. Some of the domains which would land in the NSFW bucket triggered the “bad bad” filter. I had a batch size of 50 domains. This meant that many domains were caught in the fallout of a no on word. I did figure out what was likely causing Gemini to get so mad which I won't go into here but it does bring up one of the constant obnoxious issues with modern big LLM providers, the “Trust and Safety” filters.

I almost considered swapping to OpenAI with the new “adult mode” or Grok...but you know, “eww sick gross.”

It cost me about a buck to categorize everything. It wouldn't be bad to tweak the prompt and rerun it all or just rerun the failed domains.

Instead, I decided that the ~400k domains which were categorized was good enough.

You don't know what it is, do you?

Google has a nice little api which returns the favicon for any domain. Amazing. The game api will lazy load favicons and save them into SQlite as a blob. Not all websites use the same image format so I will process the images into optimized pngs before saving. Perfect.

If a domain 404s, I'll remove it from the db. Always keep a tight ship.

I won't go into the details but getting 4 random records from a SQlite db in a performant way with a limit is...interesting. Suffice it to say, it was not fun.

After getting the single api endpoint wrapped up I spun up the client.

Look, I did it in static html and js files. It was, messy, very messy. Bun still isn't the UX I want for static js websites and as much as I like react it wasn't worth fucking around with some bundling bullshit.

And no, no htmx. Stop trying to make it happen. It won't happen. I don't have time to explain why. You can find the code on github if you want to prove me wrong.

Deploying was more painful than Vercel but less than AWS. A quick nginx config update and a domain purchase later and it is live.

It's a duck

Introducing, favvy.fun.

One thing I noticed is that the internet landscape has changed. NSWF may have been dethroned as the most dominant use of the interwebs. What has taken over? Well, I found it out when I tried the “Gaming” category.

I was hoping for some World of Warcraft, Fortnite, and LoL websites. Instead, it was gambling. That's right gambling up, down, left and right. Even sideways and y-ways too. So much gambling. Like, almost every category will eventually run into some kind of gambling site. Shopping? CS go skins. News? Best time to do pulls for some gatcha game. Travel? Casinos.

Sports? Hehe, oh sweet summer child. Sports is nothing but gambling. Draft Kings is only the tip of the iceberg here.

I may end up doing another pass with a specific “gambling” bucket but I think this is fine for now. It is pretty fun to take the top 100-500 of a category and see how well you do.

A new pet peeve was discovered though. If you have your fucking company name as a favicon which often is no more than 40 pixels large, stop, get help. Seriously, why is your name in the icon? What are you doing? Please tell me you didn't pay an agency for it.

bad icon bbad icon c

It is common both in iconography to use the first letter of the domain like amazon or google with a bit of flare which works well for a favicon. Using a few letters can help to like cnn does but it quickly feels cramped.

The app I made uses FAV stacked on one another which helps with space but is still admittedly weak.

an ok iconbad cnn iconan oker icon

It is important to know that favicons support many formats such as svg with transparency too. The size does not have to be small either as most browsers will resize it. Apple has their own special icons one can set but Apple will oft change how they get displayed.

The best favicons, and iconography for that matter, are simple shapes which immediately evoke something, ideally the website, product, or brands intended meaning. Letters are a shortcut for this and useful especially when a brand is new. Nothing wrong with using them early on.

Apple is a poster child for great design but xbox is almost as iconic.

an ok apple iconxbox icon

I have another favorite though.

buns fav icon

I cannot explain why in words but when I think of a javascript runtime named "Bun", knowing all that I know of developers, engineers, and the js ecosystem, it is exactly what comes to mind. It is certainly missing that "iconic" brand feeling but it is still young. Give it time.

It is the little things in life, the small details everyone sees but few notice, that give the world texture, making it worth living in.

Until next time.

last time
how fast is java? Teaching an old dog new tricks

where to find me?