Gaze-driven UI for infants

A project blog for the KTH course Computer Graphics and Interaction (DGI18)

Project report based on the Project specification and on my SUDOA report .

Jonas Nockert

CS student at SU/KTH.

Proof of concept I: Pupil tracking correlates with on-screen movement

1 minute read

Finally, I think I am getting somewhere. After a night of debugging, I realized that my choice of reference in terms of inner or outer eye corner must be the same for both eyes (obvious afterwards but I figured positive offsets for both pupils would be practical). If the sign of the left pupil offset is different to the right pupil offset, it is as if the user is following two different objects at the same time!

The underlying statistical measurement I am using, which is also what is used in the Pursuits paper, is the Pearson’s Product Moment Correlation Coefficient. That is, how much linear correlation there is between two variables. If the value is one, there is perfect linear correlation and if it is zero, there is no linear correlation.

In the video below, I use bar graphs [-1, 1] to present the correlation over a number of time steps. On the left is the left and right pupil correlations with the movement of the object at the top of the screen. On the right is the pupil correlations with the object at bottom of the screen.

From the bars, it is pretty easy to see which object is being followed at the moment. if you need a pointer though, a heart lights up underneath the object when there has been a certain amount of correlation during the last time steps. The correlation drops when the object switches direction so the heart disappears, hopefully I can fix this by just adjusting the thresholds or…

TODO Apply Kalman filtering on the correlation?

The correlation can also be negative, which I think means that large pupil offsets correlate with small object offsets. From the bars in the video, looking at one object seems to maximize the negative correlation of the other object, so that sounds right.

The idea now is to move the hearts along the arrows whenever there is correlation so one can see which object has caught the most attention of the infant/toddler that is using the app.

Switched to Dlib and YUV color space

1 minute read

OpenCV with Kalman filtering produced pretty good stability of the boxes around the eyes but it would definitely be better to have eye corners as reference points. Also the OpenCV eye detectors were unreliable in that it sometime took a lot longer time to analyze than other times, resulting in dropped frames.

Dlib has a 5-point detector that produces just the right landmarks for my purpose. The landmarks can be used to realign the face so that the eyes are always level. This way the eye corners provide horizontal as well as vertical reference points, both invariant to head tilt. The eye corner detection also turned out more stabile in terms of position compared to OpenCVs eye area detection.

YUV color space

Using 32 bit RGBA for each camera sample meant a lot of bandwidth and cpu wasted in transfers and conversions since detection is made in grayscale anyway. Switching to YUV420p has two big advantages in this case: a lot less data to handle and the luminance Y channel can be used as the grayscale image to perform detection on.

The downside is that it is not as simple to draw colored geometric figures on, which is why the video of the result uses black and white markers:

AVCapture, Core Image and Vision

less than 1 minute read

Instead of letting OpenCVs face detectors work on the entirety of the image, I could leverage Apples own (much faster in this case) face detection, pick the largest face (which is presumably the user of the app), and send that to OpenCV (or perhaps DLib, pending testing).

Image from [Vision Framework: Building on Core ML](https://developer.apple.com/videos/play/wwdc2017/506/).

What was a bit surprising is that Apples oldest face detector generation AVCapture turned out to be perfect for this. The detection is done in hardware at image capture time and does not provide any specific landmarks, just a bounding box for the face, which is all I need.

I would have imagined that the two newer frameworks, Vision and Core Image, would be outstanding in comparison but that is definitely not the case. Most surprising is that AVCapture is the only framework of the three that provides face yaw and pitch.

Apply pupil detection here

1 minute read

Like any good cooking show – here’s a little app I’ve prepared beforehand:

iPad app prepared for pupil detection. Personal image by author. August 2018.

This app does basically what I set out to do in the project specification:

Sketch from project specification. Personal image by author. May 2018.

Which then became a sketch in Sketch.app:

Sketch. Personal image by author. June 2018.

It’s coded in Swift, using SpriteKit, so animations are a smooth 60 fps even though the face, eye, and pupil detection might run slower than that in the background. I’m currently also using Vision, Apple’s computer vision framework, for face and eye detections too see if it makes sense to switch out the OpenCV haar cascade detection method. Vision definitely seems faster than OpenCV and has about the same accuracy (i.e. needs Kalman filtering to make detection positions more stabile over time).

The goal at the moment is just to visualize which of the two images that is followed. That is, if the pupil detection I’m using is good enough.

Obviously, there is a lot more I would like to do with it but it’s got a picture of a cat and that is hopefully enough for my child to pay close attention.

Simple feature-based adaptive pupil detection based on histogram CDF (featuring Harry, 21 months old)

less than 1 minute read

After reading and trying a lot of different things, the paper Low Cost Eye Tracking: The Current Panorama (Ferhat and Vilariño, 2016) led me to a paper I had the skills to implement from scratch: Automatic Adaptive Center of Pupil Detection Using Face Detection and CDF Analysis (Mansour and Shanbezadeh, 2010). Finally.

Even though the algorithm as described is for use on smaller BioID images, it seems to work pretty well with higher resolution images too. Although, the difference is not that great as with a greater distance from the iPad camera, the number of pixels representing an eye becomes closer and closer to the number of pixels of the eyes in the BioID images.

This is the result, after applying Kalman filtering on the pupil position:

It’s not perfect but will have to do for now!

Bibliography

Ferhat, Onur and Vilariño, Fernando (2016) “Low Cost Eye Tracking: The Current Panorama. Computational Intelligence and Neuroscience. 2016. 1-14. 10.1155/2016/8680541.

Mansour, Asadifard and Shanbezadeh, Jamshid (2010) “Automatic Adaptive Center of Pupil Detection Using Face Detection and CDF Analysis”. Lecture Notes in Engineering and Computer Science. 2180.

Jonas Nockert

Recent Posts

Proof of concept I: Pupil tracking correlates with on-screen movement

Switched to Dlib and YUV color space

YUV color space

AVCapture, Core Image and Vision

Apply pupil detection here

Simple feature-based adaptive pupil detection based on histogram CDF (featuring Harry, 21 months old)

Bibliography