@scosman
/blog
about

Dec 26, 2022

Introducing Voicebox

Using ML to help people with cerebral palsy communicate in real time

I’m starting a project exploring the use of new technologies to help people with disabilities, specifically people who can’t speak and can’t type quickly. The goal is to build tools which allow them to communicate their ideas in real time, during everyday conversations.

I have a personal connection to this project. A family member has cerebral palsy. I’ve been communicating with him my whole life. He’s intelligent, expressive, and funny. However, due to his disability he’s forced to choose between being expressive and slow, or brief and fast. Technology has helped a little over the years, but the fundamental tradeoff between brevity and expressiveness is still there.

I’ve been thinking about this problem for years and I’m excited to commit more time to it. There are a few reasons why I think now is a great time. Recent developments in large language ML models have opened the door to a lot of possibilities which wouldn’t have been possible a few years ago. Secondly, I’m excited about a design I’ve come up with for a new keyboard system for people with limited fine motor function. Finally, having left Apple in October, I actually have some time to commit to this.

This is very much an explorative project. There will be a lot of prototypes, and failing fast. My goal is to try many big ideas with the hope of finding one which has sizeable impact and can be scaled out. Most importantly, I’m treating this a bit like a startup: I care that I find something I can distribute and actually make an impact in many people's lives. I might not succeed on that, but I’m aiming for it. You can read the project mission on Github.

There are 3 major areas I’m investigating in the first wave:

  1. Use machine learning language models to reduce typing by 10x
    • Can ML help eliminate the fast-but-brief vs slow-and-expressive tradeoff, opening the door to fast-expressive realtime conversation? I think it might be able to. We can convert intention to expressive sentences, and “cold” quickly becomes “I’m a bit cold. Could you please get me a sweater?”. This builds on a shorthand system we sometimes use in our family, but allows the speaker to be fully independent and very expressive.
    • Here’s a demo of the first version I’ve built, and am starting to test.
  2. Use text-to-speech to completely eliminate text entry in some situations
    • Can we use text-to-speech, combined with LLMs to passively listen to ambient conversation, and show a set of appropriate 1-click responses with zero text entry?
    • Imagine when asked “Where are you from?”, that buttons instantly appear to say “I’m from Toronto”, “I’m from Canada” without a keyboard in sight. Or “Do you want pizza or pasta today?” being asked to a room, and instantly being able to choose from appropriate replies: “I’d love pizza thanks”, “I’ll take pasta please”, or “Actually, can we get something else?”
  3. Build new user interfaces that can speed text entry 10x
    • For most people, static keyboards are the right choice — muscle memory makes you quite quick. However I’m not sure it’s the right choice for users who lack fine motor control. I’m working on two keyboard systems codenamed Morse and Hawkins, both of which allow typing on an iPad or iPhone without moving your hands. They pair classic accessible switch techniques, with modern dynamic user interfaces, predictive capabilities, and consumer grade design.
    • If either of these work, we can expand this UI system beyond text entry, and use it for controlling the top apps/systems people use in their day.

For the next while, I plan to rapidly explore some of these areas. Some ideas will undoubtedly be duds. Some of the testing will introduce new ideas and spawn iteration. I’m hoping I can iterate into a place where I have at least one concept with enough promise to start to scale out to more people. If I’m lucky and find something, I’m excited to build it out into a widely available app or tool.

Finally, I’m calling this project Voicebox. My family member has had a dozen technological devices to aid in communication over the years; from paper boards with custom symbols, to early portable computers with voice synthesizers. At some point, we just started calling the technology du jour his “Voicebox”. I’m hopeful I can create tools that make life better for him and others.

You can follow along with developments here:

Workmark that says 'voicebox'