Am I doing a poor imiation of what garmin and strava already do? Yes. Am I learning. Yes. Am I having fun. Mostly yes. I think. I should hope so.
garth
In my previous post on the matter, I mentioned that my silly little CLI would not have the the ability to auto-magically sync. And I was wrong.
Having dug into what garmindb does, I discovered garth. garth makes it super easy to grab and download data from garmin connect. Like ~one line easy: garth.Activity.list(). That's the bulk of it. Followed by some magic with zipfile.ZipFile() and io.BytesIO().[1]
So, I can now download new activities from garmin to my ✨database✨.
magic.
duckdb
I am getting more comforable with SQL and duckdb.[2]
Bulk-importing all my old .gpx files was reasonably straightforward; ditto the .fit files I downloaded from garmin. Having one .db file to backup feels a lot cleaner than the ~2.5k that I've been struggling and failing to organise for a few years. With the added bonus of the .db being much smaller, and it can be made even smaller if I dump the tables[3] out to .parquet files.
I don't think I'm dealing with enough data to really benefit from duckdb over pandas. I could have skipped the database step altogether:
- read in a
.gpxfile - do some light wrangling
- concatenate it to an exisiting
.parquetfile
I suppose I would then lose the security that maybe comes with a database, and the ability to enforce keys exist in other tables...
As for speeed, again, I don't think I'm dealing with enough data. My points table with all the individual track points has 4.39 million rows, so maybe...
Once I've finished, I might[4] run a few speed comparisons. Since apparently that's a trendy thing to do.[5]
And most of the things I want this tool to be able to do sit under the "query" umbrella...
typer
I'm happy with my choice of CLI-maker-tool. This coupled with poetry have been mercifully straightforward to use.
And I now have a handful of commands that do things. These are invoked like so: trak import-bulk or trak cumulative-distance.
I've been diligently adding docstrings (not the best, but not nothing) to everything, which means I am also able to append --help and I get something that looks like:
$ trak filterd-date --help
Usage: trak filter-date [OPTIONS] START STOP
filtering tracks by date. returns track_ids NOTE: if input is yyyy defaults to start of year. to get full year (i.e. all tracks in 2025) input: date-filter 2025 2026
Arguments
| start | TEXT | date string for start of time period: yyyy; mm-yyyy; dd-mm-yyyy. seperators can be any of the following: /,.: | required |
| stop | TEXT | date string for end of time period: yyyy; mm-yyyy; dd-mm-yyyy. seperators can be any of the following: /,.: | required |
importers
For bringing data into the database. This is where the wrangling happens.
download-fit-filesimport-bulkimport-track
summaries
For creating tables of summary statistics & showing distance travelled by activity type for each year, along with a some graphs.
summary-by-yearcumulative-distance
Many more of these to write...
filters
For searching the database, by time or location.
filter-datefilter-location
filter-location is a bit messy, but it works either by supplying a (lat, lon) and a radius, or a bounding box.[6] I think[7] filtering by location is where duckdb will outperform geopandas. Maybe. But again, I'm not dealing with HUGE data.
plotting
uniplot is cool. I wish subplots were a thing. And I suppose they could be, by writing to a file, then sort of interleaving them...
misc
Some of my old .gpx files have some pretty funky characteristcs. There's a few[8] with negative time steps in them.
There are many instances of a single .gpx file holding two short rides, separated by a week or more, that end up greatly exaggerating the duration.
I'm manually dealing with these issues, as that is likely the quicker thing to do.
Stop detection
It is a tricky beast. I'm using movingpandas and my current definition of stopped is: spending two minutes within a 50 m radius. Typing that there, it feels wrong. But it's close enough. Ish. There are some instances, where it doesn't clock a stop that is[9] & some where it registers a stop that isn't.[10]
Who cares? I'm having fun.
footnotes
thank you stackoverflow (probably) ↩︎
both the duckdb CLI and the python API ↩︎
currently just two ↩︎
probably won't ↩︎
there are many posts dedicated to this, and many of them are a bit sloppy, if you catch my drift ↩︎
I was trying to see if I could use
mapsciiand then pipe in the bounding box from that... ↩︎based on little more than a weak hunch ↩︎
19 ↩︎
like turning off the recording device whilst having a picnic, turning it back on, and moving > 50 m before signal is reacquired ↩︎
pushing a heavy bike up a hill ↩︎