If Seen Then That

April 19th, 2023

by Simone Rebaudengo


In the past months, together with our friend at modem.works, we had the chance to get our hands on Snap's latest Spectacles. A pretty nice piece of tech, wrapped in a very hip glasses frame. Together we looked into the present and near future implications of AR glasses becoming a more common and everyday device. While of course these devices can ‘augment’ what you see , what is probably even more interesting is that they make it possible and maybe acceptable to wear a camera in your face. This, of course, can lead to very dystopian thoughts, but it might also open up a very fuzzy and unexplored world of sight first interactions with the world around us.

istt logo

Looking at the world around you with AR glasses might of course mean adding new layers of visual information, such as flying directions signs and floating whales, but more interestingly it will mean that where we look, what we look at and how we look at it becomes a new form of intent that can be senses and potential interaction with that camera and the algorithm behind it.

If any object in our sight can be recognised by the camera you are wearing then anything, literally any thing, could become a trigger, a 'button' of sorts to make something happen. And if you put this together with the whole world of 'connectivity'… then it gets interesting.

So what happens when my glasses know that I'm looking at a clock, a lamp or a pigeon? What if I could just turn on the lamp by looking at it? Or send an 'I'm late' email in the morning by looking at your clock? Or send pigeon memes on my discord when I look at one in real life?

So to answer some of these questions… we made If Seen Than That

“If Seen Then That” (IFSTT) is an experimental lens that we designed and prototyped to turn every thing you stare at into a trigger for actions in the digital world.

In a mix of Harry-Potter-like dreams and Mind Palaces theories it allows to attach to a particular kind object any action that you can find on IFTTT (If This Then That). So that, yes, you can turn on your RGB Lamp or any other connected lamp just by staring at it, but also literally anything else.

By looking around your house or any space that you are in, you can see what objects could become potential triggers, depending on what is recognized by the algorithm running in the background. Then you can ‘train’ your lens and decide what can be a trigger.


It’s not only what you are looking at, but how you look at it. We needed to create a clear difference of how we would interact with objects respecting different ways you might look at things. So creating a difference or almost a progressive disclosure of information and interactions between just glancing at something and actually ‘clicking’ it with your eyes.

“Gaze” means that a particular object is in the centre of your view for more than one second, revealing more about what the object is, its label and whether it’s a trigger or not.

“Stare” means that you are stopping on a specific object for at least three seconds. Giving the time to understand if you actually want to trigger an action with that object.


By connecting the triggers to some actions on IFSTT, you can explore any useful, playful or surreal set of input things and output actions. You could count every time you look at a pizza to keep track of your quantified self apps, or with the same pizza you could also start a party playlist. You could send memes to your discord every time you look at a phone, call your mom when a pigeon is in front of you or pull the latest news when staring at your coffee too long.

This is just a start, an experiment, but while it opens up a lot of opportunities, it also opens up many questions.

What if you could not only know that you look at a type of object, but train your lens to recognize a very particular one? What if you had, like for IFTTT, a platform of sight based recipes? Who would own this platform?

If the world around you becomes basically a huge fertile ground for ‘macros’ and chain of actions, how much would it replace ‘computer’ or apps interactions? And lastly, if we connect LLMs (large language models to this) what would we like to know when we look at something beyond just using them as a trigger?

While of course AR glasses are still in that in-between state of being real but still in the future, with quite a lot of practical limitations and unclear acceptability and undefined ecosystem, we think there is a lot of potential in focusing not only on new layers to look at, but rather new ways to turn the world around us into a truly open and customizable space of possible interactions.

If you want to know more about the background of this project, read the research piece with modem here.

Keep staring!