← helena.cool

Svenska trainer: what I built, what broke, and what is next

A couple of weeks ago, I published a post about building a Swedish A1 vocabulary trainer from scratch. Since then the app has grown quite a bit. New features, new bugs, new fixes, and now a full A2 level. This post is about what happened in between.

Disclaimer: this post goes a little deeper into the technical weeds than usual. But I will try to explain the reasoning behind each decision, not just what I did.

Spaced repetition: the algorithm that actually works

The original app used SM-2. The idea is simple: you review a word right before you are about to forget it. Get it right and the next review is scheduled further away. Get it wrong and it comes back the next day.

What I added was a retention score: a number that tells you not just whether you got a word right, but how much of it you probably still remember right now. If you reviewed a word six days ago and it was scheduled for review in seven days, your retention for that word is roughly 85%. If it was due yesterday and you have not reviewed it, it is closer to zero. The score updates in real time and shows up in the statistics screen, sorted from the words you are most likely forgetting to the ones that are fresh.

This is the clearest UI I found so far for progress tracking: Accuracy tells you how you did in the past. Retention tells you where you actually know today.

Adding A2 and the Kelly Corpus problem

The Kelly Corpus is a dataset maintained by the University of Gothenburg containing the most frequent and useful Swedish words, tagged by CEFR level. The A1 words were manageable to add manually. The A2 level has over 1300 words. Yay!

The corpus does not include English translations. Those live in a separate Swedish lexical database called SALDO, and joining the two programmatically was complex (I talked about this in my previous post). So instead I wrote a Python script that downloads the XML, extracts the top 200 most frequent A2 words by Kelly ID, and calls Claude API in batches of 20 to generate translations, synonyms, and example sentences.

After running the script I reviewed everything and cut about 30 words that were too advanced, too politically specific, or just not useful for an everyday A2 learner. Words like påstående (assertion) and omständighet (circumstance) did not make the cut. Words like lägenhet (apartment), träna (to exercise), and ikväll (tonight) did. The result is around 170 new words on top of the original 79.

The bug that made progress invisible

For a while, the statistics screen was showing zero learned words even after completing multiple sessions. The kind of bug that is a motivation killer.

The cause was a subtle React state timing issue. When a session ends, the app calls setCards(newCards) to save the updated progress, then immediately navigates to the done screen. But React batches state updates, which means the done screen was reading the old cards state, not the new one. The learned and mastered counts were always zero because they were computed before the updates landed.

The fix was to pass newCards directly to the done screen rather than reading from state.

Building the word picker with search

One of the features I added is the ability to create your own deck: browse all words in a level, toggle individual ones on or off, and practise exactly what you want. The initial version was a long scrollable list which worked but was slow to use.

Adding a search box that filters in real time made it much more practical. You can type "verb" to see all verbs, type "hus" to find house and related words, or type "adj" to filter by adjective. The filtering works across Swedish words, English translation, and part of speech simultaneously. It uses basic JavaScript string matching on the client, no server needed.

Audio with the browser's speechSynthesis API

Swedish pronunciation is not obvious from spelling. Sju (seven) is a good example: nothing about those three letters prepares you for the sound it makes, nothing! So I added audio.

The browser has a built-in text-to-speech API called speechSynthesis that works on most modern devices at no cost. We (Claude and I) set the language to sv-SE and the rate to 0.9 (slightly slower than normal, better for learning at lower levels). Words play automatically when a flashcard or quiz card appears, and there is a replay button if you want to hear it again. Example sentences also play when you reveal them.

The quality of the voice depends on the device. On Mac and iPhone it is excellent. On some Android browsers it falls back to a more generic voice. That is one of the things on the improvements list.

In my ideal trainer, I want to have different voices and accents. But this will need to happen later. Diversity is king and queen and all in between!

Cloudflare analytics: finally knowing if anyone uses this

The trainer runs entirely in the browser with no server-side code, which means there was no way to know if anyone was actually using it. The fix was one script tag.

Cloudflare Web Analytics is free, requires no cookies, is GDPR-friendly, and gives you page views, unique visitors, countries, and devices. The tricky part was finding the right place in the Cloudflare dashboard (it is under the account level, not inside a specific domain) and adding the script to each HTML file on the Pi using sed rather than editing each file by hand.

What is next


If you want to try the trainer, it is here:

Svenska A1/A2 →

And if you are building something similar or have thoughts on any of this, I would love to hear from you. Email me or leave a comment.

Comments