Lab Notes

notes on creative computers, programming, & design

Consolidating obsidian images as markdown images

2024-04-11

I’m adding a simple workflow for myself to publish micro-logs more regularly to my blog. I write these in Obsidian and copy them over to my Astro site, similar to the workflow that Tom Macwright uses. One frustration I have with Obsidian is its handling of images. When you drag in an image, it does it in the proprietary embed syntax that Obsidian uses, which is not standard markdown. Then when you copy over the markdown, images will fail to get copied over as well.

I searched for an Obsidian plugin to help solve this, but none seemed to fit the bill. I toyed with writing my own, before realizing it would just be easiest to write a simple script to fix this. My favorite way to write scripts like these these days is with Bun, which has an ingenious new “Shell” API that makes writing shell scripts a breeze, with the convenience of Typescript. With a quick and dirty regex, we can cover almost all my use cases. I also used the beloved magic-string library, which is great for modifying string indices and not really thinking about it too much.

Image files get copied over to an assets folder inside my blog folder, and then the markdown is modified to use markdown image links that point to the new files. Then we can cp this whole directory into our Astro content directory, and take advantage of Astro image processing. Works beautifully! Here’s the gist:

import { $ } from "bun";
import path from "node:path";
import MagicString from "magic-string";
import slugify from "slugify";

const VAULT = "/Users/guscuddy/Mainframe";

const file = Bun.argv.slice(2)[0];

const text = await Bun.file(file).text();

const r = /!\[\[(.*?)\]\]/g;

const IMAGE_EXTS = [".jpg", ".png", ".gif", ".svg"];

function escapeParentheses(str: string) {
  return str.replace(/([()])/g, "\\$1");
}
const matches = Array.from(text.matchAll(r));

const folder = path.dirname(file);

const s = new MagicString(text);

for (const match of matches) {
  if (IMAGE_EXTS.some((i) => match[1].endsWith(i))) {
    const escaped = escapeParentheses(match[1]);
    const slug = slugify(match[1], { lower: true });

    try {
      const file = (await $`fd ${escaped} ${VAULT}`.text()).trim();

      await $`mkdir -p ${folder}/assets && cp ${file} ${folder}/assets/${slug}`;

      const startingIndex = match.index;
      const endingIndex = match.index + match[0].length;
      s.update(startingIndex, endingIndex, `![](./assets/${slug})`);
    } catch (e) {
      console.error(e);
    }
  }
}

const finalText = s.toString();

await Bun.write(file, finalText);

And here it is as a gist.

Some notes on CSV spelunking

2024-04-11

Recently I made a web app to explore the They Shoot Pictures Don’t They list of greatest films of all time. TSPDT fortunately provides a massive spreadsheet export of their starting list of ~24,000 films. However, the data is a bit chaotic - you get an XLS file of all the films with title, director, year, country, length, color, genre, the ranking of the movie over time, an IMDB link, and a tspdt id (a universal identifier for future versions of TSPDT). I had previously attempted this with previous years, which did not include an IMDB link --- matching movies by title and year is doable, but not as accurate as if we get a canonical ID.

Luckily, the IMDB link is huge — via the IMDB ID we can look the movie up in The Movie Database (TMDB), which is a beacon of open, free APIs that one can build on (Letterboxd uses them, among many other apps). We can get photos, credits, descriptions, keywords, and more. Just need to use the Find by Id endpoint.

In order to get our XLS file in a workable format, we have a few choices. The most direct way would be to use any spreadsheet app to convert it to CSV. Once it’s in CSV, we can use D3’s csvParse method to get it into a JavaScript object we can manipulate. Not too bad!

But we’re left with one big problem: the IMDB Link is provided as a rich link, not just the URL. Oh, no! Why! When we export that to CSV, by default Numbers.app will just list that cell’s value as “IMDb”. Not helpful!

My Excel Fu is terrible, and I don’t have a copy of Excel, so I fired up Google Sheets. Unbelievably, Google Sheets does not give you an easy way to do this. The best answers they gave you were to press command c and command v - obviously undesirable for 24,000 links! Google Sheets provides “App Scripts” which lets me write JavaScript to extract a link. However, this didn’t work for a massive list of 24,000 films --- the tab just froze for me.

Turns out there’s two different kinds of hyperlinks in Google Sheets: rich hyperlinks, with no formula, and hyperlinks which are made with a formula. Written as a formula, hyperlinks can be extracted. But as a rich link, I had to turn to the weird world of Google Sheets apps. I found one that promised to extract URLs, gave it write access to all my spreadsheets (lol), and ran it. It worked for one cell, but once again gave errors when I ran it for the entire spreadsheet.

Sometimes, when chasing rabbit holes like this, it’s better to just do things the “unoptimal” way and move on rather than spend hours trying to find the perfect way. So I ended up selecting 1,000 rows at a time and just doing this 24 times to get the rich links transformed to a formula HYPERLINK(URL, DISPLAY) of sorts. Then I could write a regex to match the ID. Luckily this was trivial since the URLs were all formatted in the same way (with a trailing slash), so I could actually just call FORMULATEXT and do something like REGEXEXTRACT(FORMULATEXT(AB3), "title/(.*?)/").

With a quick download to CSV, finally… we have a CSV with the IMDB IDs.

Transforming CSV into Javascript Objects and Relational SQL

Once we have data in CSV form, it becomes a lot easier to parse and do things with.

D3 conveniently provides csvParse, which works great --- you give it some CSV as a string, and it turns it into an array of JavaScript Objects. It’s typescript-friendly, as well, taking a generic of the column headers. So we can do something like this:

type MovieData = {
  "2007": string;
  "2008": string;
  "2010": string;
  "2011": string;
  "2012": string;
  "2013": string;
  "2014": string;
  "2015": string;
  "2016": string;
  "2017": string;
  "2018": string;
  "2019": string;
  "2020": string;
  "2021": string;
  "2022": string;
  "2023": string;
  "2024": string;
  New: string;
  "Director(s)": string;
  Title: string;
  Year: string;
  Country: string;
  Length: string;
  Colour: string;
  Genre: string;
  "Dec-06": string;
  "Mar-06": string;
  IMDb: string;
  IMDB_ID: string;
  idTSPDT: string;
};

type MovieColumn = keyof MovieData;

const parsed = csvParse<MovieColumn>(text);

(Btw, I generated that intial type by just copying the columns and asking Copilot to turn it into a type.)

Once we have everything parsed, we can loop through, query the TMDB API, and grab the movie. I saved all this into a SQLite database, extracted text embeddings from the overview and saved it into a vector database, but that’s beyond the scope of this post.

For more, you can view the codebase, the specific CSV parsing file, or look at the finished project.

Crisis of Worldless Individualism (Apple Vision Pro)

2024-02-02

Here are some initial thoughts on the Apple Vision Pro, which launched on February 2nd, 2024. I have not tried it yet — I’m just speculating.

It’s incredible the amount of effort we put into technology that does seemingly nothing for the betterment of humanity. Technology historically contains within it some emanicpatory potential (i.e. in my view Technology should be liberatory). The latest technological developments are not emancipatory, unless it’s framed only from the cruel perspective of liberating us from the ugliness of our own consciousness. There’s some libidinal psychology here going on; the Apple Vision Pro allows us to further repress the mysteries of our unconscious.
Marc Andreessen has spoken about ‘reality privelege’ when he talks about the virtues of the metaverse. No matter how hard we try, we just can’t seem to materially lift people out of poverty. We swear — there’s just no solution, be pragmatic. The most practical thing is instead to give people reality distorting goggles that transport your sense of self and space/time and consciousness. Not quite another world — but rather an eerie sense of worldlessness.
Reality Distortion: When you strap on the AVP, Apple wants you to not enter a VR world like the Oculus, but to be able to spatially compute through your existing surroundings. But the early reviews point out the video passthrough on the Apple Vision Pro (AVP) is not perfect — it often has trouble with light. (Hm.) What you’re seeing is a simulacrum — a projection of your world captured on camera and sped through to your eye senses at a high-speed refresh rate. This is a fascinating and philosophical technical decision (or limitation); like the shadows in Plato’s cave.
You’ll notice that Apple, in their presentation, never uses the terms Augmented Reality (AR), Virtual Reality (VR), or the Metaverse. Instead, they seek to push the term Spatial Computing. Spatial computing feels very much in line with tools for thought land.
No longer is computing confined to rectangular screens, our eyes all day struggling as they stare at two dimensions trying to unlock a third or even fourth. There are certainly compelling ideas here.
Interestingly, Dynamicland and Folk Computer are also exploring a kind of Spatial Computing. But they do it through real physical objects: paper, blocks. Grounded in the real world. Bret Victor famously ranted against the limitations of glass screens. While there is exploratory potential to augmenting your surroundings with computing, doing so on a closed-off, individual headset — as opposed to being with others in space — and waving your hands, manipulating air… seems suspect.
I haven’t tried it yet, but how satisfying is it to type on an imaginary keyboard, your fingers wiggling through some limbic space-time continuum? Does it feel good to pick up the edge of a window, with no tactile feedback, and resize it?
Apple Vision Pro is too expensive to worry much about its immediate impact, but I have worries that AVP will be like the Walkman for consciousness. It will lead us into a cold individualism that removes us from the world and from our community. The Walkman turned music inwards, away from the communal, into the individualized (The Walkman marked the movement of music from external to internal 20220201220251). With music, this can certainly be a nice thing. But what about with your whole vision? Projects like Dynamicland and Folk Computer are built around collective computing experiences. People doing stuff together in space. With Apple Vision Pro, you’re merely looking at strange glass, fingers pointing at moons.
It seems awfully lonely in there.

Arc Browser Act 2 (concerns)

2024-02-03

Not sure at all about the “Act 2” of Arc Browser as announced in “Meet Act II of Arc Browser - A browser that browses for you” on February 1st, 2024.

I’m disappointed that the company seems to be going down the AI rabbit hole, but strangely under the false guise of freedom from search engines and advertising. They present a classic “take out the middleman” disrupter energy that startups love to inhabit, as if they’ve revolutionized and/or democratized some core evil thing. I’m in agreement with them that Big Search™ is not good, and that Google has a monopoly on search. But AI as the answer is just… not it.
Instant Links kind of just seems like Google’s “I’m Feeling Lucky” with a few extra fancy language parsing features
Live Folders could be interesting, but it does seem like we just keep re-inventing RSS, but with weirder proprietary systems. Because this is just RSS, right?
Arc Explore / Search - some pretty big red flags for me.
Yes, searching for recipes and getting listicles and pages with lots of ads sucks. Capitalism sucks and that part of the web is broken because of bad incentives. Anyway, the answer to me is not to build “a new category of software” which is basically just doing a google search and then summarizing the first few web pages. The whole slick thing of “follow the money” and “we’re pulling the internet into the future” doesn’t add up. For one, the actual engineering of the core of Arc is still built on Chromium — instead of actually delivering competition to Google’s iron grasp on the web browser. (I get that Chromium is open source, but it’s still primarily a Google thing.) So the stuff about how bad Google is kind of rings false. Moreover, I do not trust LLMs to do the very important job of curation, which is so much of what the internet has moved away from with the algorithmic feed. RSS, bookmarking, blogs, forums… they aren’t perfect, we don’t want just nostalgia, but I’m not convinced at all that a browser that “browses for you” is the way forward.
The AI making the web page thing is strange, and the workflow they presented of trying to cook a dish felt incredibly contrived. AI has no sense of context or taste; does it know what a good cookbook looks like? Where’s the joy of browsing?
Calling this a new category of software also just seems… goofy. Framing it as some historically revisionist march towards rebellious justice rubs me the wrong way, it feels like a lot of hype hype hype for what amounts to a web browser with AI features. (Weirdly there’s prominently a Bernie sticker in the diner window behind Josh’s head; what was the curious decision making behind leaving that in? A socialist web browser this is not.)
And this is all with ignoring the most obvious problem to the AI search thing, which is the same as most all non-personal LLM things: what about attribution? What about copyright and credit? So the AI just steals a bunch of work and gives it to you? There’s a major ethical gray area that they kind of seem willfully oblivious to? I didn’t see anything presented that alleviated any concerns around that. Having tried the Arc Search iPhone app I’m left with more questions than answers.
That being said — I was an Arc early adopter and have liked some of what Arc has done (though to be honest I’ve grown more weary of the sidebar as good UX). More web browsers is good, we need more diversity! I’m just worried about the direction they’re taking with putting all their chips on AI. Some of their ideas they started out with (Easels, for instance) seemed more promising to me.
”A Browser that browses for you” is probably not going to be good for privacy.