Blog

Augmented Reality
MOBILE

Hand-tracking in AR for Jewelry Retail

Michael Girnyak

6 May 2020

17 Min Read

01 Challenges that retailers face nowadays

02 Downscale video to use a pre-trained model

03 Find finger edges

04 Fixing finger in place by features recognition

05 Next step

Challenges that retailers face nowadays

Brick and mortar retailers were already battling lost revenue from online shoppers before the Covid-19 pandemic. Now, with people being forced to do all of their shopping online, it’s become even more apparent that retailers must modernize in order to meet the needs of consumers.

By leveraging modern AR technologies retailers can enhance the online shopping experience, giving a needed boost to sales. In this article we’ll show you how this approach can be applied to jewelry stores using our ring fitting application as an example.

We’ve seen a few such applications available on mobile app stores, but all of them were non-interactive or requiring additional AR markers. It’s obvious that such limitations drastically spoil the user experience, so our goal was to create a solution working without markers and leveraging all modern AR trends.

Hand-tracking in AR for Jewelry Retail - photo 1

What we were looking for was an app that could recognize your hand without any markers, and be able to render a 3D model of the ring on the proper finger.

As a teaser, we’ll show you the video with final results. If you are interested in the implementation, we will be going into more detail in a future article.

Downscale video to use a pre-trained model

Fortunately, the medipipe team already created a TensorFlow Lite model capable of detecting a hand with exact positions of fingers. I really recommend you to check out their GitHub repo, as they have many more great examples besides this one.

To use this pretrained model we needed to shrink our video from camera preview having 640×480 resolution to 256×256 pixels. We do that by downscaling with cropping to preserve only the central part of the video. Now we can apply this model and it will return us a set of points for each hand joint.

Hand-tracking in AR for Jewelry Retail - photo 2

A full guide to TFLite model inference is available here.

Find finger edges

The next step was to figure out the width of the finger in pixels so that we could adjust the rendered ring to it. For this task we used the basics of math and computer vision. We intentionally marked 2 dots with red on the previous GIF to show that we are interested only in them for now. We will draw a ring just on this finger for now. Support for other fingers will be implemented in future, as well as some UI for fingers selection.

In order to find edges, we will use a simple Sobel Filter. Those familiar with computer vision (CV) already know what kernel is and how it is applied to an image. But for others we’ll take a moment to explain. A kernel is essentially an array (for images, it’s usually 2-dimensional) of numerical coefficients along with an anchor point in that array, which is typically located at the center.

How does convolution with a kernel work?

Assume you want to know the resulting value for a particular location in the image. The value of the convolution is calculated in the following way:

Place the kernel anchor on top of a determined pixel, with the rest of the kernel overlaying the neighboring pixels of the image.

Multiply the kernel coefficients by the corresponding image pixel values and sum the result.

Place the result to the location of the anchor in the input image.

Repeat the process for all pixels by scanning the kernel over the entire image.

Mathematical notation for that operations can be expressed as:

Hand-tracking in AR for Jewelry Retail - photo 3

FFortunately, OpenCV provides you with the function filter2D so you don’t have to code all of these operations.

-1	0	1
-2	0	2
-1	0	1

Horizontal

-1	-2	-1
0	0	0
-1	-2	-1

Vertical

Kernels used in the Sobel edge detection

We are going to use only the horizontal one in order to avoid noise from vertical filter and thus increase accuracy of our edge detection.

As you may notice, the filter goes from negative to positive values to detect change in gradient on image, but unfortunately this will correctly detect only edges that go from dark to bright regions of the image, leaving edges that go in opposite directions not found.

So, we need to do another trick here, we’re applying two filters, one that goes from left to right and another that goes from right to left.

// Here you can see difference between left and right kernel
val leftKernel = Mat(3, 3, CvType.CV_16SC1)
val rightKernel = Mat(3, 3, CvType.CV_16SC1)
leftKernel.put(0, 0,
        -1.0, 0.0, 1.0,
            -1.0, 0.0, 1.0,
            -1.0, 0.0, 1.0)
rightKernel.put(0, 0,
              1.0, 0.0, -1.0,
              1.0, 0.0, -1.0,
              1.0, 0.0, -1.0)

If you’re interested in this topic, check out this platform and try kernels yourself.

You may ask: what if our hand is turned slightly, so our target finger is not in an exact vertical position, would filters still find edges correctly? Yes, they still will find edges, but it will not be so precise.

So, to get the best performance we would rotate our image in order to straighten the target finger so it will always be pointing upward. To achieve that, we’ll calculate the slope of the line between those red dots on the previous demo, and find the angle of necessary image rotation. With this angle set, we will build a rotation matrix and apply affine transformation to our image.

The result of that operation you can see in the next GIF.

Hand-tracking in AR for Jewelry Retail - photo 4

// Calculate slope as ratio
// p14 and p13 is points that we interested in I marked them with red on // initial GIF
val slope = (p14.y - p13.y)/(p14.x - p13.x)
// Convert ratio to degrees
val slopeAngle = Math.toDegrees(Math.atan(slope))
// Build rotation matrix in order to apply affine transformation
val rotMat = Imgproc.getRotationMatrix2D( p13, angle, 1.0 )
// We need inverse matrix rotation to transform our points back to
// original coordinates
val inverseRotMat = Imgproc.getRotationMatrix2D( p13, -angle, 1.0 )
// This operation turn our image as shown above
Imgproc.warpAffine(image, image, rotMat, image.size())

As a result we have a perfectly vertical finger as we want it to be. Now we can apply our sobel filters for both directions (left to right and vice-versa).

Hand-tracking in AR for Jewelry Retail - photo 5

// Apply kernel to our image using OpenCV library as easy as this
Imgproc.filter2D(image, leftSobel, -1, leftKernel)
Imgproc.filter2D(image, rightSobel, -1, rightKernel)
// first argument is source image (in our case grayscale with rotation // applied)
// second argument is where we will store result
// and last argument is our kernel

Check out the documentation on Filtering with OpenCV.

In this GIF you can see the difference between the left and right horizontal Sobel operators application. One highlights left to right edges and the other right to left edges. By left to right edges we mean that shade goes from dark to bright and vice-versa.

Now we just need to decide where we want to put our ring and place it according to detected values.

As a result we will get 4 points necessary to render our 3D model of the ring correctly.

Hand-tracking in AR for Jewelry Retail - photo 6

Fixing finger in place by features recognition

Not everything goes as we expected it to be in perfect conditions. First tests showed that we have another issue. As you can notice, although coordinates are being detected quite well, they tend to jiggle slightly from frame to frame, creating an annoying jittering effect.

We bet no one will use our app if the ring shakes on your finger.

This problem appears due to imperfection of the pretrained model. Model is trained on small images so even little imperfections can produce noise that end up in such a jittering effect.

In order to fix this and “glue” our ring to the previously detected position we can use another CV trick that is called Homography. This technique itself is a huge topic for another article, so we’ll simplify a description a little bit here.

Using one of the popular features detector (SIFT, SURF, ORB, BRISK, etc.), we can detect a bunch of simple features on the current and previous frames and find out how the image of the finger changed in-between. According to these changes we can move our previously detected points to a new position, getting rid of this jittering effect.

Hand-tracking in AR for Jewelry Retail - photo 7

On the left we have the previous frame, and on the right the current one. Dots represent detected features, lines between dots connect similar features on both images. Using these similar features we can find homography and calculate a new transformation matrix for our previously detected dots.

Hand-tracking in AR for Jewelry Retail - photo 8

First results were pretty satisfactory, we received more stable placement points for our ring, but this was still just a temporary solution. We trained our own hand-tracking model that doesn’t have issues with jittering.

AR ring tryon with a sizing feature

Relying on our hand-tracking model, we developed a unique AR tryon app for rings that offers its users a smooth virtual tryon experience with realistic rendering of gemstones.

As users open the app, they are offered to take a photo of their hand and virtually try on a ring from a selection of jewelry pieces available in a virtual gallery. The ray tracing technology allows app users to see the play of light in the gemstones and exactly what they look like in real life.

We went further and developed a ring sizing feature that can be either integrated into the AR tryon app or can be offered as an independent tool. The technology behind sizing ensures that a ring won’t sit on a knuckle or float on the user’s smartphone screen.

We leveraged the power of high-resolution smartphone cameras and developed a digital tool that measures the size of a ring finger with the help of a credit card. The finger should be placed next to a standard-sized object in the same photo to show the proportions and determine the size. The result — a margin of error that doesn’t exceed 0.1mm.