I’ve only really begun appreciating just how much a 1M long context window is. It’s absolutely insane. I’ve done some work trying to see how many posts I can classify in one request and the amount that I can stuff into a request to Gemini is quite more than any other model.
I read about Stack Overflow partnering with OpenAI and had the same questions as everyone else, which is “what the heck does Stack Overflow get out of this?”, and it looks like probably not that much? Seems like a way to cash out on their data trove while it’s still worth something (and the answers aren’t all completely AI-generated junk). Understandably, Stack Overflow users are not happy and are deleting their answers.
Microsoft created generative AI for spies, apparently. Besides the “bloated US government” and “enterprise tech” jokes, my take is, “why the heck are you telling us what US spies have access to?”. Leave it to a Big Tech company to know what it’s like to not keep user data secret.
There’s a bit of hype around xLSTM as a modified souped-up scalable LSTM variant that’s able to scale in ways that transformers can’t. My understanding is that transformers won out for working better than LSTMs due to fixing problems around forgetting and attention windows, but at the expense of being drastically more complex. I guess what’s old is new again, and it’s good proof that having good knowledge of the fundamentals always pays off.
Sam Altman’s been on a wild publicity tour for GPT5, and OpenAI’s been making many moves as of late. GPT4 is already good and is SOTA, and GPT5 is supposed to blow it out of the water? That’ll be impressive.
Software engineering
Working with data from a scrappy startup such as Bluesky definitely makes me aware that I have to expect API changes that affect me to happen more often than not. I’ve been trying to lean into NoSQL databases as a result (I’ve started using MongoDB) and away from SQLite. I initially used SQLite to enforce strict schemas on ingestion but I realized that using Pydantic actually provides the same functionality but with much more flexibility. I’ve invested a lot of developer time in the past 2 weeks just going back and adding Pydantic models and it’s been a huge help. I’ve found so many silent bugs due to inconsistent fields, values having incorrect types, etc., and after adding the models I now have more confidence in the robustness of the ETL pipelines.
I’m looking forward to Bluesky implementing filtering on their firehose, which looks to be on their 2024 roadmap. Working with the entire firehose feels unwieldy, especially at scale. Anyone who wants to create a custom feed has to store their own copy of the firehose posts before adding filtering (since adding filtering first makes the firehose awfully slow). It’s OK for now and it makes sense that this has been on the wayside for now, they are a small team of devs after all. Looking forward to this going through.
Research
Stanford HAI published a recent think piece about their research on creating social media algorithms that promote social values, which is (somewhat) related to what I’m working on. I think this is a step towards work that acknowledges that social media algorithms are never going to truly be neutral. Even if you’re optimizing for engagement, you’re explicitly prioritizing content that is more engaging, rather than “socially rewarding”. The paper is interesting, and I’m curious to see where they take the research next. It looks like their research encountered problems similar to those that I’ve encountered in mine, and they address it in different ways that I’m making note of and will definitely consider.
I finally got to read up on the Bluesky technical paper, and even though a lot of the protocol and decentralization content was beyond my wheelhouse, it was nice to read a technical systems overview of a codebase that I’ve worked pretty extensively with as a consumer.
I’ve read up on how researchers detect toxicity online, and it seems like we do well in defining what type of content we don’t want online, such as toxic or hateful speech. But it looks like there’s been a bit (and increasing over time) amount of work on defining what type of content we want more online. The Stanford paper is one example, and another example is this paper on classifying constructive speech. This has had me down a rabbit hole of reading papers on constructive speech as well as learning about what constitutes “unhealthy” speech.
I think that bridging algorithms are an interesting approach towards resolving polarization. The idea makes sense in theory, but it seems like there’s still some work in how to apply it in practice. I’ll have to read up more on it; admittedly I’m not as aware of the theory as I’d like. Google Jigsaw published an interesting paper on the effects of upranking content in feed algorithms based on prosocial attributes.
Personal
My daily question streak with my girlfriend on the Agape app has reached 117 days! It’s been a good daily practice, one of many that we do to make sure that our relationship gets the attention that it deserves, even amongst the daily routines of life. It would be at ~200 days now, except for the fact that I missed one day because I was on a plane from America to the Philippines (note to future self: plan ahead!).
I’m ~50% through Living Life Backward. It’s been a really profound book and has caused me to interpret Ecclesiastes in a very different light. I do agree with its premise, which is that you won’t truly appreciate life until you meditate on not only its finiteness, but also in the fact that all of our labors will wither away. It’s a perspective that is both daunting but also relieving. It’s gotten me to think about doing the right thing just for the sake of doing it in the first place. This blog post explains it well: the joy of life is much easier to appreciate when we’re mindful of death.
I’ve officially been a Reverend for 1 year! I am certified to officiate weddings (in Las Vegas).