Will AI kill sites like Booking.com? AI as a Personal Aggregator

I talked a little bit about my thoughts on AI Personal Assistants previously. I find the idea very exciting! My general sense is that they are more complex to implement than people first imagine, but will be brilliant for relatively simple tasks that don’t involve a high degree of personal preference. What I didn’t explore in that article is the effect of their successful widespread adoption.

AI as a Personal Assistant

The interaction model suggested for these personal assistants is fairly straightforward: you say “Get me a taxi there” or “Bring me a pizza here” and any AI Assistant worth its salt will be able to translate your request into a series of necessary actions and negotiate with whichever third-party services are required to achieve the desired outcome.

If we ask an AI assistant to book a taxi, the AI itself will then interact with another service, whether via an API or (as the rabbit r1 does) directly interacting with websites and apps on your behalf, parsing the result and then feeding that back to you in the form of your conversation.

Large Action Model
The rabbit r1 has a “Large Action Model” (LAM). The LAM translates your request into an action that it can perform on any given app, with the ability to learn how to perform actions on Apps that it hasn’t already been trained on.

For the rabbit r1, that means connecting your existing services to the platform. If you want your AI to be able to book a taxi then you will need to connect your Uber account first. The same with DoorDash for a pizza or Spotify for music.

That appears to be a perfectly sensible approach. We hand over control of our other accounts so that we can let our assistants use them on our behalf, as we might a human assistant. Now, instead of tapping on a screen, we’ll just speak or type and get exactly what we want from the connected apps.

Under the above model, we’re applying an existing and accepted paradigm to the new technology. In this case, we’re replicating what a human would do; open an app and get a result. The perceived benefit is in our own interaction – or lack thereof – with the end service.

From Stratechery:

When it comes to designing products, a pattern you see repeatedly is copying what came before, poorly, and only later creating something native to the medium.

It feels like this is what is happening here. Connecting the AI to an app is just a copy of what came before. So, what is the native approach that is only possible as a result of the new technology?

Well, the huge benefit of an AI is not that it can simply replicate human functionality as a human assistant would, it’s that it can blend that with the more “traditional” functionality of computers that humans are not necessarily good at.

One of those things is doing multiple things, at the same time, very quickly.

Aggregators and the Status Quo

It won’t surprise you to learn that very few services on the internet are truly benevolent gifts to society. Wikipedia perhaps. Archive.org, maybe.

Most websites are balancing their own motivation to make money with the quality of the service that they provide to users. Google is great, but the days of 10 blue links are long gone – the results for any mildly competitive search term are headed with at least a screen of ads. Price comparison sites like Booking.com appear to exist to make customers’ lives easier but their revenue comes from the supply side, typically in the form of referral fees or ads.

Ben Thompson’s Aggregation Theory defines how these businesses work and the framework that they operate under. The internet is full of examples of companies that have, by securing both sides of a market, built incredibly successful businesses. One of his primary examples of an aggregator is Uber.

The initial technical idea behind Uber (“a taxi at the tap of a button, with live tracking on a map and easy payment”) was a good one! At launch, it was also relatively difficult to replicate. But not impossibly hard. Uber knew this; technology was not a moat on which it could rely. To build a real moat, they needed to invest in aggregating both sides of the market:

Supply (Driver Network): Uber spent a lot of money rapidly building an extensive driver network to serve its app, ensuring there were always drivers available. They did this initially by paying very well, with high fees on journeys, bonuses and referral rewards.

Demand (Passengers): Uber aggressively invested in marketing and branding to build its customer base, especially in building referral networks (at one point I had hundreds of pounds of credit on the app, just from these referrals). Uber’s prices when they launched were also ridiculously low. When Uber launched in London, a journey in an Uber Black Mercedes S-Class (the only Uber that was available) was cheaper than a Black Cab.

Uber spent billions of dollars executing all of these strategies and through its aggregation of the supply of taxis and passengers in a given locale, “won” many major cities. But that win came at a huge cost, subsidised by investors. Eventually, it had to normalise costs; driver pay went down and passenger costs went up. The moat that they’d built was wide and deep, but competitor services popped up and built their own networks by offering similar incentives as Uber had.

The key element for Uber though is that it still has the most users, who return to the app because it still has the most available taxi drivers, who in turn return to the app because it has a consistent supply of customers.

The actual service that Uber offers though is not highly differentiated. In any given major metropolitan area, there are at least 4 or 5 significant apps that provide the same service – routing a Toyota Prius to the customer to take them where they want to go – at around the same price point. Ultimately their product is a commodity, with little differentiation from other competitive apps. They, to some extent, rely on customers’ (a) habitual use of their app and (b) lack of motivation to check every service at once for the best price/arrival time.

What happens though if we introduce an intermediary into the equation which is motivated to check every service for the best deal? Say, an AI which can do multiple things, at the same time, very quickly?

AI as a Personal Aggregator

This is the big paradigm shift: introducing the Personal Aggregator. Instead of a human user using one platform to look for a taxi, why would AI not look at all of the platforms to find the best match?

Multi_App.png

Of course, this doesn’t apply just to taxis. A sufficiently powerful AI will be able to rely not just on the results of booking.com, Amazon (not technically an aggregator) or DoorDash, but the entire internet.

The fundamental shift here is in the motivation of AI itself. An AI Assistant is likely part of the user’s personal infrastructure – not that of a third party motivated by the desire to make profit. Of course, there is an assumption here, but I think it’s a fair one. The early generalised intelligence that we’re seeing produced by the likes of OpenAI isn’t focussed on doing just one task well, it does everything, from history homework and language translation to coding and eventually price comparison across the internet.

With a motivation of always returning to the user the best possible deal for a given stay, product or other service – without concern for who the seller is - and having looked at the entire market, an AI assistant stands to be the ultimate aggregator.

As part of our personal infrastructure, the motivation of the AI will be to go wherever it needs to to get the right deal, using whatever blend of requirements that the user has provided it with along with the historical preferences that it has learnt about you. For our taxi example, one person’s personal aggregator may aim for the cheapest possible car at the cost of comfort and expediency. The same AI acting as a personal aggregator for someone else might prioritise comfort over cost and so on.

The fundamentals of how we intertact with a huge array of businesses could completely shift, not just because using the AI is easier, but also because it is the best way to secure the best deal. The widespread adoption of this type of interaction with an AI as personal aggregator could have massive consequences for all sorts of businesses, and not just the existing aggregators.

If we go beyond the creation of an increasingly competitive (but arguably more efficient) market, the intermediating effect of the AI will disrupt one of the core tenets of Aggregator Theory: that of the direct relationship with the end user. When that direct relationship is removed in favour of the ease of use and better user experience of the AI, it is not just the transaction that is being intermediated, nor the buying decision, it is also the pre-buying decision. What is the point in marketing a commodity product from Store X to buyers if they’re going to buy it via their AI?

It may feel that we’re a while off this reality, but as Bill Gates said “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten”.

The effects on the markets that these commodity products exist within will be drastic, but there will be an increased incentive for differentiation to occur.

Embracing the New Paradigm

The advent of AI as a personal aggregator heralds a transformative era in how we interact with and perceive the digital marketplace. This shift isn’t just about convenience or efficiency; it’s about a fundamental change in the power dynamics between service providers, aggregators, and consumers. AI won’t necessarily be great at telling us what to buy, but it will be at telling us where to.

For decades, aggregators have thrived by simplifying choice and monopolising user attention. They’ve dictated market terms, often at the expense of both service providers and consumers. However, as AI steps in as the ultimate intermediary, it will potentially democratise access to the market. By evaluating all available options to identify the best deal, AI will disrupt the traditional aggregator model, shifting the focus back to the quality and value of the service itself.

This change will compel service providers to innovate and differentiate. Yes, initially there may be a race to the bottom in terms of price but differentiation will need to occur instead as a result of focusing on enhancing quality and user experience.

Moreover, the rise of AI assistants as personal aggregators raises important questions about the future of marketing and customer relationships. As AI becomes the gatekeeper of consumer choices, traditional marketing strategies might lose their efficacy. Businesses will need to adapt by finding new ways to engage with both AI systems and their users, focusing more on the intrinsic value of their offerings rather than just visibility and brand recognition.

The emergence of AI as a personal aggregator is not just a technological advancement; it is potentially a catalyst for a more equitable and consumer-centric marketplace. It will challenge existing business models and forces a rethink of market strategies across various sectors. We’re not there quite yet, but we are standing on the brink of a paradigm shift and it is essential for businesses to understand and adapt to these changing dynamics. The future promises an era where choice is not just about availability, but about relevance and quality, in which consumers are hopefully empowered by AI like never before.

Generative Video: Targeting You Soon

On Thursday OpenAI released Sora, their text-to-video model. It is remarkable, and I plan to write more about it and the future of video more generally in a forthcoming article.

Here’s an example of what it can do…

Prompt:

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Output:

This video was generated completely by the AI model. No 3D rendering or other inputs. It is, simply put, lightyears ahead of what we’ve seen from the likes of Google’s Lumiere and Runway’s Gen-2 which were the state of the art.

The most impressive details aren’t on the main Sora page though, they’re hidden away in the technical report. Not only can Sora generate entirely fresh video as shown above, but it can also transform existing videos significantly, based on a text prompt:

Input:

Output:

change the setting to the 1920s with an old school car. make sure to keep the red color

There are obviously huge implications to this for video entertainment, but one of the more fascinating aspects of this for me is in what it means for ad creative.

Yes, we might not like video ads but they are sadly here to stay. One of the reasons we don’t like them (beyond the interruption) is that they often feel so un-targeted and repetitive. That’s not entirely surprising given that the cost of making a 30-second commercial runs into the 10s of thousands of dollars let alone the actual time it takes to produce.

From the preview that we saw on Thursday though, these videos can be made in less than 10 minutes and at significantly reduced cost. Whilst the prospect of seeing hyper-targeted videos akin to last decade’s oddly specific t-shirt ads is not appealing, from a brand perspective being able to quickly and cheaply produce not only a series of ads targeted at each and every demographic but also alternatives to A/B test with is a marketer’s dream.

Film one video and then have AI produce modified versions for different ages, locales and cultural norms – a little dystopian? Probably. Inevitable? Absolutely.

Sure, we’re (maybe) unlikely to see this footage used in the Super Bowl, but as YouTube ads? Absolutely. And really, I’m not entirely averse to receiving YouTube ads that fit with my interests as well as Instagram ads tend to.

It’s not just preplanned creative that will benefit here either – social media teams seeking to produce responsive ads that match the calibre of Oreo’s famous 2013 tweet in response to a power cut at the Super Bowl now have a tool that can produce high-production quality video in minutes.

Apple and AI

Media speculation has been swirling around the idea that Apple is lagging behind in the AI race.

The headline advances in AI over the past couple of years have been generative technologies, and if you’re comparing chatbots then, yes, Siri looks like ancient technology. Apple’s approach to AI though has been to infuse it throughout iOS, with the goal of incrementally improving the user experience throughout without highlighting to users just how much of the heavy lifting the AI is actually doing.

This is consistent with the Steve Jobs playbook: “You’ve got to start with customer experience and work backwards to the technology. You can’t start with the technology.” Apple aren’t focused on making AI the centrepiece of their product launches and in fact have previously gone out of their way to avoid using the term, instead preferring the use of “Machine Learning” to describe the technologies that underpin the experience, but if you look around iOS it is omnipresent.

Screenshots of AI in use on an iPhone

Features that rely on Artificial Intelligence have been integrated throughout iOS: Cameras produce a quality of image far beyond what is achievable solely with the tiny optics fitted to the phone. Messaging has translation baked in. And swiping up in Photos gives options to look up the Artwork (or Landmark, Plant or Pet) shown in the image.

The approach that Apple takes is quite different from that of other manufacturers – these improvements aren’t flagship features, but relatively subtle improvements to the overall product experience that work towards that overarching goal of improving the customer experience.

Generative Hype

On Thursday, during an earnings call, Tim Cook addressed the question of AI directly, stating:

As we look ahead, we will continue to invest in these and other technologies that will shape the future. That includes artificial intelligence where we continue to spend a tremendous amount of time and effort, and we’re excited to share the details of our ongoing work in that space later this year.

Let me just say that I think there’s a huge opportunity for Apple with Gen AI and AI, without getting into more details and getting out in front of myself.

This is the most direct indication that the company is looking to bring more generative AI features to iOS. What form that will take is still speculation, but we can assume that one area that will be addressed is Siri.

What is Siri?

To most, Siri is Apple’s slightly confused voice assistant.

Whilst this part of Siri desperately needs improvement in the context of the other services that users have become familiar with, I think that it’s unlikely that we’ll see Apple release a chatbot with the freedom that GPT (or Bing) had. I just can’t see a world where Apple allows Siri to tell users they are not good people.

Apple has an additional challenge in their overall approach: Their privacy-focused stance has seen more and more of their ML tasks performed on-device, and Apple’s long-term investment in the dedicated Neural Engine core (first introduced in 2017) has demonstrated that their strategy is very much focused on doing as much as possible without leaving the device. This results in some limitations in both the size and quality of the model that underpins Siri – what ChatGPT achieves running in a Microsoft data centre, Siri needs to within the phone.

The slightly lacklustre voice assistant isn’t Siri though. Voice is simply one of the interfaces that allows users to interact with Siri. Look for Siri elsewhere and you will start to see that Apple considers Siri to be almost everything that is powered by AI.

When widgets were introduced in iOS 14, Apple included one particular widget which I think hints at the direction of Apple’s eventual longer-term AI integration: Siri Suggestions.

The widget is actually two types of widget; a curated selection of 8 apps that change based on context and location, and a set of suggested actions based on what Siri anticipates you will want to do within apps, again based on your context. Whilst I think both are brilliant, and I use both types on my own home screen, it is the second that I think gives the best indication of where Apple’s AI strategy is heading.

Screenshots of Siri Suggestions

Apple provides the ability for apps to expose “Activities” to the wider operating system. Whether that is sending a message to a specific friend, booking a taxi or playing the next episode of a show, each activity is available to the widget without needing to go into the app to find it.

Within the widget, Siri then presents what it thinks are the most relevant activities for a given time and place. Arrive at a restaurant at 8pm and don’t look at your phone for two hours? Don’t be surprised if the top suggestion when you do is to book a taxi back home. Usually call your parents at 7pm on a Sunday? Expect a prompt to call them to appear. The ability for Siri to combine the contextual clues that the operating system has, its data on your historical patterns and the activities available to it allows it to have the unique ability to accurately predict what you want to do.

The most notable element is that the focus here is on actions using apps rather than on the app itself. This returns us to the primary driver of good user experiences; helping the user to achieve their desired outcome in the easiest possible way.

Actions, Activities and App Clips

Given that many people have spent the past decade using a smartphone, it is not uncommon to have hundreds of apps installed, most used extremely rarely. I, for some reason, still have an app for Parc Asterix installed despite last visiting for just one day nearly 4 years ago.

We’re moving away from the days of “There’s an app for that” and into the days of “Why do I have to download an app to do that?”. Apple’s solution, introduced in 2020, is App Clips.

App Clips are a way for developers to provide access to features of their app to a user without having them download the full app. They’re often contextual – a restaurant could provide access to their menu, ordering and payment via an App Clip via a code or NFC tag on a table. In Apple’s words: “App Clips can elevate quick and focused experiences for specific tasks, the moment your customer needs them.”

Whilst I’ve rarely seen App Clips used in the wild, I sense that this is another example of Apple moving pieces into place as part of their future strategy.

Fewer Apps, more Actions

By encouraging developers to provide access to specific actions or features within an installed app or via an App Clip, Apple has created an environment for their devices in which Siri can provide users with the correct action based on context, potentially without users even needing to have the app installed.

As Siri’s features become more powerful, I predict that Apple will start to leverage the actions more and more, potentially even handing off the interaction entirely to Siri.

Concept screenshots of how Parc Asterix might exist if not as an app

Take the Parc Asterix app for example – my ideal user experience is that my phone knows when I’ve arrived, checks my emails for any tickets that I already have and presents the option to buy them natively (no downloading apps) when I don’t. When I’m inside the park, I want it to provide me with easy access to an App Clip which contains a map and ride waiting times. But then I want to be able to leave and not have yet another app that won’t be used for years.

Apple’s headstart

It’s easy to point at Siri’s chat functionality and suggest that Apple is falling behind, but I think the reality is quite different. Apple has spent almost a decade building AI tools that seamlessly integrate with the operating system. They have the greatest level of access to information possible because, for most people, our lives live within our phones. I want to see Apple leveraging that and integrating AI throughout the OS to work for me and make my life that much easier.

Where the likes of rabbit have been working on Large Action Models for the past year, Apple has been at it for a decade.

I do hope that Siri’s chat functionality gets a lift this year, but I don’t think that should be the focus, I want a device that anticipates me and understands what I need to make my life easier. Apple, more than anyone, is able to deliver on that.

How hard is being an assistant anyway?

A year ago, Ben Thompson made it clear that he considers AI to be an entirely new epoch in technology. One of the coolest things about new epochs is that people try out new ideas without looking silly. No one knows exactly what the new paradigm is going to look like, so everything is fair game.

The devices we carry every day have pretty much not changed for 15 years now. Ask a tweenager what a phone looks like and in their mind, it has likely always been a flat slab of plastic, metal and glass.

There are attempts to bring new AI powered devices that are accessories to phones to the fore – I wrote about the Ray-Ban Meta glasses last year – but it has taken until the past couple of months to see devices emerge that are clearly only possible within this new era.

Enter the Humane Ai Pin and the rabbit r1.

An illustration of the rabbit r1 and the humane Ai Pin devices

Both are standalone devices. Both have a camera and relatively few buttons. Both sport very basic screens (one a laser projector!⚡️) for displaying information to the user rather than for interaction. And both are controlled, primarily, by the user’s voice interactions with an AI assistant.

In theory, these are devices that you can ask to perform just about any task and they’ll just figure out how to get it done.

As a user, this sounds like the ideal scenario, the ultimate user experience. Issue a simple request and have it fulfilled without further interaction. The dream, like an ever-present perfect human assistant.

The Perfect Assistant

But let’s put aside the technology for a moment and figure out what we would expect of that perfect human assistant.

For the sake of this thought experiment they are invisible, always there and entirely trustworthy. Because we trust them, we would give them access to absolutely everything; email, calendar, messages, bank accounts… nothing is off limits. Why? Because they will be more effective if they have all of the same information and tools that we do.

So, with all of that at their disposal, the assistant should be able to solve tasks with the context of the rest of my life and their experience of previous tasks to draw on.

Simple Tasks That Are Actually Quite Complicated

We’ll start with an easy task: “Book me a taxi to my next meeting

The assistant knows where you currently are, so they know where to arrange the pickup from. And they have access to your calendar, so they know where the next meeting is, and what time it is. They can also look on Google Maps to check the traffic and make sure that you’ll be there on time. They know that you have an account with FreeNow, and prefer to take black cabs when you’re travelling for work. And so, when you ask them to book a taxi, they can do so relatively easily, and you will get exactly what you need.

A graphic detailing the parts of the simple task of booking a taxi

Exclude one of those pieces of information though, and you will not necessarily end up with the desired result. And that’s for a relatively straightforward request.

When you make the request more complex, the level of information and the variety of subtasks required becomes huge. “I need to meet with Customer X at their office next Wednesday”. If you’re in New York and your customer is in Austen, TX, there are flights to be arranged, hotels to be booked and transfers in between, not to mention the diary management that occurs.

A graphic detailing the parts of the complex task of booking a trip

These are also pretty normal requests of an assistant - things that happen the world-over every day but which are, when broken down, incredibly complex and made up of various interdependent subtasks. Each of them is important and if one of the subtasks fails, the entire task can go wrong.

The perfect assistant though, would be able to handle all of this without breaking a sweat, and we would rely on them because we trust in one of the hardest human traits to replicate: their judgement.

Enter the AI-ssistant

The inferred promise in the rabbit r1 keynote is that you will be able to say I want to go to London around 30th January - 7th February, two adults and one child with non-stop flights and a cool SUV and it will be able to plan, arrange and book the entire trip for you.

This, if it is true, is remarkable, precisely because of how complex and interlinked tasks actually are. If we remove the professional elements from the above example, the sub-tasks involved in booking a trip like this and the understanding required are still huge.

I think the r1 is a cool concept, but the hand-wavey elements of the keynote (”Confirm, confirm, confirm and it’s booked”) are alarming, precisely because those are the actually hard parts. Getting ChatGPT to spit out a travel itinerary is easy but actually having an AI that is able to follow through and execute properly on all of the required tasks is another matter.

Don’t misunderstand me, I fully believe that an AI could navigate a webpage and follow the process to select and pay for things. I can see in the keynote that the r1 has the ability to connect to Expedia and would bet that it can book a hotel on the site.

My quandary is that when I, as an actual human™, go onto Expedia to book the above trip, I’m presented with over 300 options just for the hotels. At the top hotel, there are 7 different room types with only a couple of hundred dollars in cost difference for the entire stay between the largest and the smallest. This is already complicated before I throw in personal taste.

Then once you throw in flights where the option that I’d likely choose based on my personal time preferences is actually 9th on the price-ranked list (which is still within $20 of the cheapest option) and I just don’t see how the r1 is ever going to give me what I actually want. I know what that is and I know that a human assistant who has gotten to know my preferences and proclivities would likely make the same choices as I would, but that’s because we both have that very human trait of personal judgement.

I can see how an AI who has had access to all of my past bookings may be able to detect patterns and preferences, but I also can’t see any evidence that the r1 does have that access, or ability to learn about me personally. I won’t comment on the humane pin, but I can’t see much evidence of that there either.

My feeling is that a “just” good assistant, one that is just able to follow your directions and get stuff done, is actually quite hard to replicate. Combine that with the traits of a great assistant, one that can anticipate your needs, with good judgement and potentially even solve problems before you ask and we’re just at another level of complexity.

It’s not that I’m bearish on AI assistants as a whole but I do think that the role of being an assistant is much more complex and personal than people imagine. I can’t wait to see where we end up with daily AIs that we interact with but I can’t help but feel that an assistant in this manner just isn’t it. Yes, help me sort through the cruft of those 300 hotels, but I don’t think I’ll trust an AI to make the call for me in the same way as I would a human, at least not anytime soon.

Sebastifact: A fact machine for 7 year olds

My son is 7, an age where he is becoming interested in what I do at work. He ‘gets’ the idea of apps and websites, but I wanted to put together a very simple project that we could build together so that he could see how to take an idea and turn it into a real “thing”.

We brainstormed some ideas – he loves writing lists of facts and finding pictures to go with them with the ambition of building an encyclopaedia, so we started work on a simple website that he could type the name of a historical person into which would then return a set of 10 facts to him.

His design goal was pretty simple - the website should be yellow. I decided that it was probably worth sticking to focussing on the functionality, so yellow it is.

As is usually the case, the backend is where the action is. I wasn’t sure how to explain to him just how complex this site would have been to make just 2 years ago - the idea of entering almost any historical figure into a website and having it simply provide 10 facts back would have been a hugely laborious process with many years of contributions and yet, here we are in 2023 and the solution is “just plug a Large Language Model into it”. This results in a pretty easy introductory project. We talked a little about how an API works and how computers can talk to each other and give each other instructions, and then set to work writing the prompt that we wanted to use.

As we tested it, he started to ask if it would work for animals too. And then mythical beasts. And then countries. Seeing him working through the ideas and realising that he could widen the scope was great. This is the eventual prompt we settled on:

You are an assistant to a family. Please provide responses in HTML. The User will provide a Historical Person, Country, Animal or Mythical Beast. Please provide 10 facts appropriate for a 7 year old. If the user provided name is not a Historical Person who is dead, a Country, an Animal or Mythical Beast please respond with an error message and do not respond with facts.

By asking the LLM to provide responses in HTML we offloaded the task of formatting the output and GPT-3.5 Turbo is pretty good at providing actual HTML. I haven’t seen any issues with it yet. By instructing it to make sure the facts are appropriate for a 7-year-old, the tone of the facts changed and we got facts that were (surprise) actually interesting for him without being too pointed in their accuracy.

The response takes a few seconds to come back so I implemented caching on the requests - the most popular searches appear instantly. Ideally in the future, I’ll give all of the results URLs.

As a final bonus, I plugged in the Unsplash API to return images for him. It doesn’t always work (Unsplash apparently has relatively few pictures of Plato) but for most searches, it provides a suitable image. I might consider changing that to use the Dall-e API, but I think for now this is good enough.

There were two takeaways from this. Working with a 7-year-old is a test in scope creep. I wanted to keep this to an afternoon activity so that it would keep his interest but of course, it could have been a much larger site if we had incorporated all of the ideas. Giving him something that he had actually built was the important goal for me and keeping the scope to something achievable whilst also feeling like something that is his own was the most important thing. The second is something that I hammer on about all the time: an LLM is a massive toolbox that can help users achieve almost anything, but there is great value in providing a User Interface that allows a user to achieve a very specific task. There is a good reason why there are so many kitchen gadgets that can basically be replaced with a single knife - the user experience of using a dedicated tool that requires less skill is better.

The site is at https://www.sebastifact.com

Ray-Ban Meta Glasses: Truly Wearable AI?

I’ve been excited to get my hands on the new Ray-Ban Meta Glasses, and picked up a pair yesterday.

An illustration of a Robot wearing Ray Ban Meta Sunglasses
Me, wearing my new glasses

The most intriguing aspect of the glasses for me is the prospect of mixed-mode AI without taking my phone out of my pocket. Meta won’t release this until probably next year, but I do have some observations on how we could get there slightly sooner.

Open AI released their multi-modal version of GPT-Chat about a month ago, which means that you can now speak to Chat GPT (an oddly stilted style of conversation which is still quite compelling, I wrote about it here) and send it images which it can interpret and tell you about.

One of the cool features that Open AI included in the voice chat version is that on iOS the conversation is treated as a “Live Activity” – that means that you can continue the conversation whilst the phone is locked or you are using other apps.

What this also means is that the Ray-Ban Metas do have an AI that you can talk to, in as much as any Bluetooth headphones connected to an iPhone can be used to talk to the ChatGPT bot whilst your phone is in your pocket. I’ve looked at options to have this trigger via an automation and shortcut when the glasses connect to my phone but ultimately don’t think that is very useful - I don’t want an AI listening all the time, I want to be able to trigger it when I want it. It did lead me to add an “Ask AI” shortcut to my home screen which immediately starts a voice conversation with ChatGPT which I suppose will help me to understand how useful a voice assistant actually is over time. I also had high hopes that using “Hey Siri” would be able to trigger the shortcut, which it can, but not when the phone is locked. So close and yet so far.

As I said above though, this feature is also something that all headphones can be used for. The grail, and ultimate reason for getting the Ray-Bans, is in letting the AI see what you can see. Given that this feature won’t be officially released until probably next year, what options do we have?

The solution may come in the form of another Meta product, WhatsApp. I built a simple WhatsApp bot earlier this year which allows me to conduct an ongoing conversation with the API version of GPT-4, it’s quite rudimentary but does the job. The cool thing about the decision to deeply integrate the Meta glasses with other Meta products is that you can send messages and photos on WhatsApp directly from the glasses without opening your phone. The glasses will also read out incoming messages to you. This works pretty well with the bot that I’ve built; I can send messages using the glasses and they will read back the responses. I can say to the glasses “Hey Meta, send a photo to MyBot on Whatsapp” and it will take a snap and send it straight away. The GPT-4V(ision) API hasn’t been released yet, but once it has been, then I’ll be able to send the pictures to the bot via WhatsApp and the glasses will be able to read back the response.

This all feels pretty convoluted though and is ultimately an attempt to hack my way around the lack of available functionality. The Meta Glasses are quite cool but they aren’t wearable AI. Yet.

As with many things within the space at the moment, the technology feels tantalisingly close but not quite there. The last time anything felt quite like this to me though was the dawn of the smartphone era. Playing with the glasses has made me oddly reminiscent of playing with the accelerometer in the Nokia N95… if we’re at that point with AI then the iPhone moment is just around the corner.

Short thoughts on chatting with an AI

Open AI released multi-modal AI a couple of weeks ago and it has been slowly making its way into the ChatGPT app. It is quite disconcertingly brilliant.

An illustration of a Robot and a Human having a conversation.
An accurate depiction of the process

Conversation is a funny thing. Reading podcast transcripts can be quite nightmarish - we don’t realise how much the spoken word, especially during conversations, meanders and is peppered with hesitation, deviation and repetition until we see it written down. When we’re speaking though, these unnecessary additions make conversation human and enjoyable. When we’re listening, we often don’t realise how much we are actually playing an active role in doing so–in person, it’s the facial expressions and nods which encourage the speaker to continue; on the phone, the short acknowledgements that let a partner know you’re still there and listening. I often speak to a friend on the phone who mutes when they’re not speaking and the experience is fine, but the silence is slightly off-putting.

And so to the experience of chatting with an AI. It’s brilliant, in as much as it actually feels as though you are having something of a conversation. The responses aren’t the same as the ones you would receive by directly typing the same words into Chat GPT - they’ve clearly thought about the fact that spoken conversation is different. There is surprisingly little lag in the response. You don’t say your piece and then wait for 10 seconds for it to process; the AI responds in a couple of seconds, almost straight away once it’s heard a long enough pause. The quality of the AI is fantastic - it’s using GPT-4 which is about as state-of-the-art as it can get, and the voices, whilst not human, are surprisingly great.

However.

The entire experience is disconcerting because of how precise it is. There is no room for you to take long pauses while you think mid-sentence, or rephrase as you talk. There is absolute silence when you are talking which causes you to look down at the screen to make sure it’s still working. The responses are often long and apparently deeply thought through, but they often end with a question, rather than just being an open-ended response to work from. I’m looking forward to having an AI conversational partner, but I want it to help me tease out ideas, not necessarily give me fully formed AI thoughts on a subject. I want it to say “yes” whilst I’m speaking for no apparent reason other than to encourage me to keep talking through the idea. I want it to meander and bring in new unrelated but tangential ideas. Ultimately, I guess I want it to be a little more human.

OK Computer: 21st Century Sounds

Musical Discovery

For much of the latter half of the 20th century, new music discovery went something like this. An artist would make a song and they’d send demo tapes out to record companies and radio stations. They’d play to dimly lit bars and clubs, hoping that an A&R impresario lurked in the crowd. If they were lucky, a DJ might listen to their demo and would play it live. Perhaps someone would record it and start bootlegging tapes. These contraband tapes would be passed around and listened to by teenagers gathered in bedrooms. If all went well, the artist’s popularity would grow. They’d be signed, played more on the radio and do bigger shows. Fans and soon-to-be fans would go to record stores to listen to the new releases and buy the music on vinyl, tape or CD. The record shops would make money, the musicians would make money, the record companies would make more money.

An illustration of a Robot DJing.
R2-DJ

This began to shift in the early naughties, driven, as so much was, by the emergence of the internet. The newfound ability to rip CDs and transform tracks into easily shareable mp3s on the likes of Napster rendered the entire world’s music repertoire available gratis to eager ears. For those that preferred their music to come without the lawbreaking, the iTunes store and others made purchasing it just about as easy. MP3 players and the iPod made it effortless to carry 1000 songs in your pocket. The days of physically owning your music were all but over in the space of only a few short years.

Despite the music industry’s hope that killing Napster would stem the rising tide, the death of the platform only resulted in more alternatives appearing. It turned out that people liked having instant access to all music for pretty much free. Music discovery underwent a transformation. To acquire a song, one simply had to search for it, and within minutes, it was yours—provided Kazaa or uTorrent were operational and your parents didn’t pick up the phone and break the connection. Online forums teemed with enthusiasts discussing new musical revelations and leaks, offering nearly everything and anything you desired, all for free.

Music was no longer scarce; there were effectively infinite copies of every single song in the world to which anyone could have immediate access. Gone were the days of friends passing around tapes or lingering in record stores. The social aspect of music discovery shifted from smoky bars, intimate bedrooms, and record emporiums to the virtual amphitheaters of online forums, Facebook threads, and text message exchanges.

The big problem with all of this of course was that it was all quite illegal.

In 2006, Daniel Ek was working at uTorrent. He could see that users wouldn’t stop pirating music and that the attempts by the Music Industry to thwart sharing were doomed to failure. He “realized that you can never legislate away from piracy. Laws can definitely help, but it doesn’t take away the problem. The only way to solve the problem was to create a service that was better than piracy and at the same time compensates the music industry – that gave us Spotify.”

Musical Curation

Spotify launched with the simple headline: A world of music. Instant, simple and free.

By 2023 it has over 500 million users.

For many music fans, playlists took center stage, with enthusiasts investing hours trawling the internet for them. Spotify introduced a feature to see what your friends were listening to via Facebook and then to directly share playlists with others. Then they made playlists searchable. That killed off sites like sharemyplaylist, but meant that when I needed three hours of Persian Wedding songs, all I had to do was hit the search bar and appear to be intimately familiar with the works of Hayedeh and Dariush for my then soon-to-be in-laws.

In 2015 Spotify launched Discover, a dynamic playlist which introduced users to tracks that were similar to what the listener had played recently. It was remarkably good. The social aspect of music discovery was being lost but it was replaced with an automaton that did the job exceptionally well, even if the results were sometimes corrupted by the false signal of repeated plays of Baby Shark.

What was more subtle about what had been happening throughout this period was that the way people consumed music was changing. We had progressed from music discovery as a purposeful act to one in which it was an every day occurrence. Background music had always existed, but it was via the radio, or compilations. This was personal. The value of the music itself transformed. The ability to have a consistent soundtrack playing, at home, at work, in the car or as you made your way through every day life, meant that listeners weren’t necessarily concerned about the specific artists that were playing, they had become more interested in the general ambience of that ever-present background music. Listeners still certainly relished the release of the new Taylor Swift album, but they also listened to music that they didn’t know more easily and without ever inquiring as to who the artist was, simply because it fit within the soundtrack of their lives.

The Discover feature was one of Spotify’s first public forays into personalised music curation using machine learning. The success of the project led to more experiments. It turned out that people loved the feature.

Spotify in 2023 is remarkable. When I want to run to rock music, the tempo and enthusiasm of the suggested playlist is exactly right. When I want to focus on work, the beats are mellow and unobtrusive. The playlists “picked for me” change daily, powered by AI. I still create my own playlists, but the experience is now akin to using ChatGPT. I add a few songs to set a general mood and Spotify offers up suggestions that match the general vibe. Prior to a recent trip, I created a playlist called “Italy Background Music”, which Spotify duly filled with tracks I wouldn’t have had the first idea about where to find. They were exactly what I was looking for.

Curation and general discovery, it seems, have been broadly solved by Spotify.

Musical Creation

I’ve become accustomed to hearing tracks that I’ve never before heard and wouldn’t have the first idea about the artists of. Occasionally, I’ve tapped through on an unknown song and discovered that it has only had a couple of hundred thousand plays, ever. Spotify is clearly drawing on the entire breadth of artists within its library to match my musical preferences. Or is it?

In 2016, Music Business Worldwide published an article stating:

Spotify is starting to make its own records.

Multiple cast-iron sources have informed us that, in recent months, Daniel Ek‘s company has been paying producers to create tracks within specific musical guidelines.

By introducing its own music into the (literal) mix, for which it has paid a producer a flat fee, has added to the platform under a false name and then surfaced to listeners via its AI curated playlists, the platform is solving two issues that it considers important: from a user’s perspective, more music that fits their desired soundscape is a good thing. From Spotify’s perspective, having the ability to add, say, a 3 minute track of in-house music (which they don’t need to pay royalties on) to every hour of listening means that their cost for that hour is reduced by 5%. The losers in this case are the artists, who would otherwise have earned from that three minute play.

There is nothing that says that Spotify can’t do this though, it’s their platform and even when big artists have removed their music from the platform in protest at the company’s policies or actions, they’ve quietly reappeared within months, such is the value of the service1.

In the above 2016 article, it is clear that the firm was paying actual producers to make the music. In 2023, the landscape is likely quite different. AI has advanced to such a point where the beats, melodies and riffs of jazz, trip hop and other non-vocal music can be quite easily produced by a well trained AI. There are dozens of sites that algorithmically generate lo-fi background music. If Spotify isn’t already adding tracks generated by AI that perfectly match a given vibe, especially within those non-vocal musical genres, then it is at least experimenting with it. The prize is too large to not. In 2021, the company paid more than $7bn to rights holders. At 5%, that’s a nice $350m to find down the back of the AI sofa.

Licensing

Where this leads in my mind is something sort of entirely new.

Whilst vocal-less (can we really call it instrumental?) music is the easiest use of the technology, we saw earlier this year as there was a brief explosion of AI generated tracks from creators using AI voice models that imitated the likes of Drake and Kanye. Whilst these tracks weren’t perfect, they showed an early preview of technology that will change the face of music. The Hugging Face community is full of models of popular artists which can replicate the sound of a given singer or rapper and it is evident improvements move at a rapid clip, with some now indistinguishable from the original artist.

Licensing of brands exists broadly in most other industries. In fashion it saved Ralph Lauren (although nearly killed Burberry). It famously turned Nike from pure sportswear to casual fashion mega-brand. Could we see the emergence of the artist as a brand? The potential for artists to either directly license their musical likeness to a given platform or to allow producers to use an authorised AI model of their voice to create tracks which they, or their team, would have final sign off on could allow vocalists to extend their reach drastically.

Whilst the last idea might sound fanciful, there are artists who already draw on the online community. One of the DJ/Producer Deadmau5’s biggest tracks was the result of a fan sending him a demo of a vocal mix via twitter for a track that Deadmau5 had produced the previous day in a livestream.

We’ve also seen a rash of artists selling their music rights–will the future see those artists who reach the end of their careers sell their “official” AI model to allow them and their families to earn in perpetuity? It’s been proven repeatedly that those artists who adapt to the changing world are the ones that succeed, but this is something entirely new.

What seems certain however is that the music that we listen to in the coming years will be picked for us by machines and at least partially created using AI.

Footnotes

  1. There is a good argument that Taylor’s reasoning for removing her music wasn’t entirely to do with this.

It’s about to get very noisy

The internet is about to get a lot noisier, thanks to the rise of Large Language Models (LLMs) like GPT.

An illustration of a Robot Monkeys typing on typewriters

The Creators of The Internet

40 million people sounds like a lot of people. It’s roughly the population of Canada. And in 1995, it was the population of the entire internet. But today, today the population of the internet is about 5.2 billion. That is quite a lot more people. Most of those people are consumers. They use the internet without adding anything to it; this is fine, it is good in fact. They watch Netflix, chat on WhatsApp and scroll on TikTok. They might tweet on Twitter but, since the great Elon-ification of that place, it appears that even fewer of them are even doing that. These people used to be called ‘lurkers’ on forums. What they aren’t doing is creating websites or trying to get you to buy something or writing blogs like this one. For our purposes we can say that these people are being very quiet indeed.

But then there are a much smaller group of people who, like me, see the internet as the place that they can make a bit of noise and generally conduct some form of creativity, be it in running a business or creating reels for consumers to scroll or cobbling their thoughts into a 1000 word article. This is also a good thing - consumers can’t consume if no one is creating stuff for them to consume.

Signal vs Noise

Most of the stuff that is being created is pretty average; it’s just background noise, a light hum. A few people might consume it, but it’s receiving 27 likes on Instagram and isn’t really getting in anyone’s way. It’s the background music that plays in shops and hotel lobbies that fills the uncomfortable silence. Always there, but quite pleasant and you’d probably miss it if it wasn’t there.

Then we have the good stuff - these are the pieces that go viral: the think piece that nails a topic so absolutely, the tiktok that gets a bajillion views, the hot take that turns out to be absolute fire, the stack overflow answer that comes like an angel of the night to salve your programming pain. This is the sweet melody of signal. We like signal, and we spend most of our consuming time wading through the mire of noise looking for it. Sure the background noise is a bit irritating, but it’s relatively easy to find the good stuff once you know where and how to look for it. Google is pretty good at finding what we want. We’ve built aggregators like reddit or hacker news, where people can come and say “Hey everyone! Look, I found some gold!” and then other people can upvote it if it’s actually gold, or downvote it to oblivion if it’s not actually good.

All of this seems sort of fine. Every year more noise is created, but so too is signal. The good thing about this is that an individual creator could only create so much noise. Automated content was pretty obviously automated content, and even if Google didn’t manage to filter it out for you first, once you started reading it, it was pretty clear that no human had set eyes on it during the creation process. We become quickly attuned to that particular lack of harmony and click the back button, which also signalled to Google that the content wasn’t actually helpful, meaning the next person searching for that particular query would be less likely to end up seeing that page.

The problem we’re facing now though, is that creators have been given a whole new kind of instrument to play. It’s not that this instrument is louder or more dominant that the others, it’s that it can create an almost infinite amount of songs with barely any human input at all. And they all sound pretty great. LLMs are really good at creating noise. It’s not just that they can create an ungodly amount of content (basically for free) but differentiating that content from human generated content is, by design, almost impossible. Where a content creator could once have put out a few decent quality articles a day, they can now put out thousands. The ratio of noise to signal is about to dramatically shift.

Welcome to Technical Chops

One of the most exciting elements of working within the tech industry is that there is always lot of ✨new✨.

New ideas, new products, new words, new technologies. New things are good. New things are shiny. But which new things are going to change the world? Which technologies are going to fundamentally change how we interact with the world? How do we differentiate between hype and practical utility? And what does this all mean for the vast majority of the world?

As a non technical person, and even as a technical person, it is increasingly difficult to differentiate between the simply new and shiny and the new, shiny and potentially world changing. I’m looking forward to exploring how new technologies could change the world, and what it means for our next rotation.

Over the past few years I’ve been paying particular attention to:

  • Open Source Software and its wider adoption and integration
  • Artificial Intelligence
  • The evolution of software development and the speed with which the hard becomes easy
  • The deep integration of technology into every facet of our lives

I’m committing to one article a week exploring these topics and more. Follow along here or at @technicalchops on Twitter.