Using a Screen Reader as a Proxy “AI Assistant”

I spent a good month experimenting with my Samsung Galaxy S8 smartphone set in “eyes free” mode. Instead of staring at the mini screen to surf the Web, read email or use my other favorite apps, a screen reader narrated the information and helped me navigate the screen. Speaking in a female voice generated by the built-in Google Text-to-Speech engine, it guides you by announcing what icon or menu item is active and what actions you can take next.

The whole point of this exercise was to simulate interacting with an AI assistant like Alexa or Siri, except that you send your intent silently (via touchscreen gestures or keyboard input) and she also responds silently through your Bluetooth earbuds. A screen reader, a software used by users who are blind or visually impaired to operate a computer or mobile device, fits the bill nicely. One thing that’s lacking is the ability to send high level intents (e.g, “read my new emails”). Instead you have to open the email app and scan through each unread item, but that is detail we’ll ignore for now.

I tried the TalkBack and Voice Assistant programs for my experiment, as both came preinstalled on my Galaxy S8. TalkBack is the official screen reader from Android. Voice Assistant, developed by Samsung, is similar in form and function– unsurprising because it was derived from an earlier version of TalkBack. There are some differences which I’ve summarized below:

TalkBackVoice Assist
– Open source, actively maintained
– Supports all Google TTS voices
– Audio Ducking (fade other audio sources when TalkBack is speaking)
– Proprietary by Samsung
– Only supports US English voice on Google TTS (but all Samsung Engine voices)
– More keyboard shortcuts (jump to paragraph/line/word)
Comparison of Talkback and Voice Assist for Android

It took me several days to get comfortable using either program. Partly it’s because I tended to select actions before TalkBack/Voice Assistant stops speaking, and this would throw it off. I also struggled to remember the numerous gestures for navigating. Here’s a sample:

The first 10 basic TalkBack gestures. Copied from PDF tutorial in Deque University's site. The PDF link is https://dequeuniversity.com/assets/pdf/screenreaders/talkback-images-guide.pdf
Example of TalkBack gestures. Visit Deque University’s TalkBack tutorial for more.

To force myself to use the screen reader, I installed the Black Screen app to completely blank out the screen, but going 100% eyes free proved too difficult and disabled it after a day. Without it I probably ended up peeking 10% of the time, mostly when I got lost and could no longer work the apps. Having gone through this struggle, I now have deeper admiration for blind users who can deftly use smartphones.

My Experience

One thing is for sure. I loved being able to catch up on news, follow my favorite blog sites, read ebooks, stream music and use other apps without having to look at the phone screen to navigate, and without having to speak commands out loud (not appropriate at times). This is a powerful capability.

From a cognitive view, it’s also pretty cool to tap the tactile and auditory channels of the brain, leaving the visual system dedicated to its task. In other words, you can access information on-demand without needing to shift your eyes to glance at the info. Your eyes are free to keep looking at what’s important or if you choose to, have them closed.

Tom Cruise in the movie Minority Report, using his 2 hands to manipulate objects in a virtual screen

This also opens up interesting possibilities in VR/AR, gaming or other immersive applications where users are interacting with objects spatially (think Minority Report). Instead of projecting text, it can be more effective at times to process information non-visually.

However, I found several limitations with TalkBack and Voice Assistant:

  1. You can’t type text without looking at the on-screen keyboard. This appears to be a limitation on existing Android screen readers. Apple’s VoiceOver is considered by many blind users to be superior and offers an onscreen Braille input.
  2. Navigating long text documents is cumbersome. For example, skipping to the next paragraph is a two-step action. You first select the Element mode (paragraph / sentence / word / letter), then swipe right.
  3. Copying and pasting blocks of text is cumbersome.
  4. Some gestures have no equivalent keyboard shortcuts. Important gestures such as Swipe Left, Swipe Right, etc. are not supported.

The lack of keyboard shortcut mappings for all gestures is a big deal. It prevents the Twiddler3 from fully replacing the phone’s touchscreen to perform all TalkBack or Voice Assistant functions. On one hand, this is not surprising. The Android team and Samsung want users to input gestures via the touchscreen. The keyboard is a second class citizen.

Where to go from here? I’ll cover this in a followup post.

One thought on “Using a Screen Reader as a Proxy “AI Assistant”

Leave a Reply

Your email address will not be published. Required fields are marked *