This article discusses the development of Clifford’s Reading Adventures, a series of interactive educational games for young children from Scholastic Interactive LLC. The new game’s immersive gesture and voice experiences are made possible with Intel® Perceptual Computing SDK 2013. We discuss new methods used with perceptual computing to capture children’s gestures and voice qualities, troubleshooting tactics for the SDK, and the considerations made in supporting portable All-in-One PCs.
Figure 1. Clifford The Big Red Dog*
Scholastic Interactive LLC is a division of Scholastic, a global children’s publishing, education, and media company. Scholastic Interactive’s goal is to make games for young children that are not only exciting and fun to play but also educational. Scholastic is interested in perceptual computing and gesture technology as a new area of development for young children’s educational games because the activities are intuitive—the child doesn’t have to be taught how to play the game. Integrating the perceptual computing platform, with its voice and gesture technologies, is a great application to give children ages 3+ an easy, natural way to interact with the story material by joining Clifford and his friends on their adventures.
In this Clifford series of four interactive episodes, players watch each adventure, engaging with the story using the voice and touchscreen functions of their computer. The story invites the children into the action by having them “help” Clifford in a variety of ways through gesture and voice-activated activities.
Figure 2. Clifford's Reading Adventures Menu
In Clifford, Scholastic saw the perfect opportunity to offer the interactive technology to their youngest readers, who actually see Clifford reacting to their voices and movements. In the featured storyline, the children watch animated segments of each adventure, actively engaging with the characters and their activities by touching the screen or calling out answers verbally. Players also advance the storyline as they play games featuring touch and gesture. Each game is based on early core literacy skills and may be repeated as often as desired.
Intel Perceptual Computing SDK 2013 provides APIs, samples, and tutorials designed to interpret the sensors required to use gestures and sound in an application to the children experiencing the game. The SDK’s core capabilities of speech recognition, close-range hand and finger tracking, face analysis, augmented reality, and background subtraction allow software developers to quickly integrate these capabilities into their applications on the latest tablets, Intel Ultrabook™ computers, and All-in-one PCs. Using microphones, cameras, and touch screens, along with orientation and location functionality (now common on tablets, convertible laptops, and All-in-One PCs), developers can build increasingly immersive applications.
Figure 3. Intel® Perceptual Computing SDK
Scholastic interviewed a number of development teams about the game concept, interactive activities, and the key usability concerns for children. Ultimately, Scholastic partnered with Symbio because their team was experienced in developing and implementing gesture and voice recognition and they had extensive background in children’s education, gaming, usability.
Adapting perceptual computing technologies to the movements, gestures, and voices of young children poses several challenges. Routine to Scholastic’s process, each prototype was tested extensively to see if the game design was appropriate and the gaming levels were achievable. This testing helped the team identify the challenges the test players (target-age children) encountered, which then pointed the way toward solutions that were suitable for the target age.
Several aspects of the development phase are especially noteworthy for developers interested in perceptual computing. Following are some of the highlights from the Clifford application development.
Calibrating Voice Recognition
Voice recognition required a number of checks and filter steps to provide acceptable performance levels. Because a child’s voice goes through continuous development as they grow, especially for the young age of the target audience for the Clifford series, it was important to ensure that voice recognition was well-calibrated to identify the nuances of the child’s voice and speech patterns.
Figure 4. Game screen using voice recognition functionality
Verifying and Locking Gestures
One of the games from Clifford’s Reading Adventures asks kids to help Clifford catch toys falling from a “toy tree.” Kids use their hands to grab an onscreen basket and then gesture left and right to move the basket back and forth to catch the toys.
Figure 5. Clifford's Toy Tree
Developers added algorithmic checks to verify the gestures and lock them to the player’s hand on the basket so it would move in reaction to the child’s gestures. During the testing phase, the young players were engaged, enjoying themselves, and timing their catches well. Prior to the testing, developers incorrectly assumed a child’s control of the catch would be similar to the adult evaluators in the development lab. Working with children’s gestures provided many learning opportunities for the developers, and ultimately they were required to re-think the game design to enable the imprecise gestures of younger children. The motion-noise of the children’s large and often erratic movements were challenging to capture accurately, as the sensors had difficulty recognizing and interpreting the complex, multi-part gestures. Then, prototyping and reiterating the gestures needed careful moderation to achieve a quality experience. Accommodating child gestures required broadening the movement capture area so that even if the gesture was not precise, the action would be recognized and the desired response activated.
For example, in another mini-game, players help Clifford pull weeds from his garden. Instead of having children reach their hand down to grab the weed and then throw their hand up to pull it, developers switched to a hand closed/hand open motion to symbolize grabbing the weeds and throwing them away. Making changes to accommodate the children’s developmental capabilities and movements helped make the games more successful.
Figure 6. Kids used gestures to help Clifford clear the weeds
Following is code used to calibrate the player’s gestures in a tutorial within the game that asks users to move their hands and spin a ball. Applying some “//exponential smoothing” in the game shown in Figure 7 allowed better control and ease of movement. The smoothing steps help subtract or at least average out the unpredictable player movement that the game needed to ignore.
Figure 7: Ball Spinning Screen
Figure 8: Ball spinning tutorial code sample
The immersive experiences you can create with the SDK gives players immediate in-game reactions to their movements. This gives players the benefit of feeling like they are physically participating in the game. However, there were some limitations in the camera’s ability to track complicated gestures and for the voice detection to recognize specific responses voiced by children.
Gestures
The perceptual computing camera focuses on an area approximately 2-3 feet from its location. Due to the short distance between camera and subject, the researchers and development team found that simple movements and small defined gestures were more effective than large motions or complex movements, which might drift past the camera’s range.
Trial and error was required to get the gestures to work exactly as desired. The development team had to pay attention to different environment conditions, lighting, and distance to camera.
In terms of the SDK, API, and technology, it is easy to get basic first versions of gestures working because tutorials, sample code, and structures are included in the SDK. Once you’ve set up the development environment, you can follow a tutorial such as a finger tracking sample to explore the sensors-to-code relationships that complement the SDK.
Figure 9. Gesture sensors-to-code relationships from the Intel® Perceptual Computing SDK 2013
Developers found the SDK lacked information about the different coordinate systems used for gestures. So they had to work out for themselves how these worked through trial and error methods.
Figure 10: Visual of gesture coordinates
The team initially used the “node[8].positionImage.x/y” approach, discarding the depth information as it was not needed for the gestures being implemented. Later, the team found a better approach. They used the depth image and searched the nearest pixel to help pick up the gesture effectively. Then they added exponential smoothing to help improve gesture detection as well.
Voice Recognition
The voice recognition functionality of the game was greatly affected by devices and scenarios; voice recognition worked well in some devices and situations but on other devices or situations it didn’t work at all.
In the game, young children must be prompted to repeat appropriate commands, which are picked up by the microphone. Accurate recognition needs to be achieved even with background noise and game music. Voice recognition works either in speech detection mode, where it tries to detect what you’ve said, or in dictionary mode, where it tries to match what you’ve said against your dictionary, which is user-defined in the game.
At first the team tried the detection mode and configured it to accept any noise made as acceptable, given that young children’s speech is not always well enunciated. However, this did not work as well as expected. Instead, the team used the dictionary mode, which works well in clean scenarios, if words are said clearly. The team tried to add variants of words to allow more words to be accepted even when not clearly enunciated (sell, cell, sail, ail). However, the dictionary mode worked less effectively because the greater the number of keywords the greater the chance for an error or miss-match. Developers had to find a balance between the acceptable keywords and the potential error rate. In the final application, the acceptable words were kept to a minimum so there could be a simple interaction for the kids.
As touch screen manufacturing capability advances, larger screens are being manufactured and sold. This screen size expansion is another area Clifford’s Reading Adventures demonstrates. Many of these larger screens are being incorporated into a segment of the computer market called All-in-One (AIO) PCs.
AIO PCs consist of a monitor (ranging from 18 to 55 inches) with a motherboard built behind the screen. They have high-performing processors, full high-definition (HD) resolution (1080p/720p), and a Bluetooth* wireless keyboard and mouse and support a built-in high-capacity battery, making the device easily portable. AIOs are one of the fastest-growing categories of PCs. A big reason for their popularity is that AIOs let you do everything you expect in a PC—keep track of household expenses, do homework, play interactive games, browse the Web, chat with friends, and watch TV and movies.
Many of these newer AIO’s are also portable (pAIO) adding to their versatility. The pAIO devices enable game and application developers to take full advantage of the larger screen real estate, high-performance networking capability, and a multitouch user interface (UI), all in a slim-line, portable device that can be used in both tilted and flat-surface modes. The internal battery ensures continuity of experience, and built-in wireless networking allows roaming from one home location to another. The large HD display is supported by high-end graphics processors as well as a full multitouch-enabled user experience (UX). All of these features offer an attractive package for developers looking to break free of the constraints of a single-user mobile device.
The Clifford developers were very excited about seeing their game played on larger screens so they made sure their game ran well on the 1920×1080 screen resolution.
During development testing the teams had great fun. The team also learned a great deal performing user studies with their target audience (kids). While this structured testing was very helpful, it was most rewarding to see the final game played by our own families. One of our senior developers showed it to his three-year-old daughter and the development team was delighted to hear how engaged she was with it and how much fun she had! Score!
Figure 11: Clifford and his happy playmates
The Scholastic team is excited about using the technology in more titles. Scholastic and Symbio are working on a new game featuring the Intel® RealSense™ 3D SDK, planned for release in Fall 2014.
Figure 12: The game is now available
First announced at CES 2014, Intel® RealSense™ technology is the new name and brand for what was Intel® Perceptual Computing technology, the intuitive user interface SDK with functions like speech recognition, gesture, hand and finger tracking, and facial recognition that Intel introduced in 2013. Intel RealSense Technology gives developers additional features including scanning, modifying, printing, and sharing in 3D plus major advances in augmented reality interfaces. Using these new features, users can naturally manipulate scanned 3D objects using advanced hand- and finger-sensing technology.
Tim Duncan is an Intel Engineer described by friends as “Mr. Gidget-Gadget.” Currently helping developers integrate technology into solutions, Tim has decades of industry experience, from chip manufacturing to systems integration. Find him on the Intel® Developer Zone as Tim Duncan (Intel).
Source code provided by Scholastic Interactive LLC to provide a model strategy for exponential smoothing functionality for apps using Intel’s Perceptual Computing technology for the Windows 8 platform.
Scholastic Sample Source License Copyright (c) 2014, Scholastic Interactive LLC. Code pertaining to exponential smoothing functionality contained in the Clifford’s Reading Adventures 1.0 game (“Sample Code”). All rights reserved. Redistribution and use in source and binary forms of the Sample Code, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. FOR THE AVOIDANCE OF DOUBT, THE ONLY RIGHTS GRANTED UNDER THIS LICENSE ARE LIMITED TO THE SOFTWARE SPECIFICALLY DESCRIBED ABOVE, AND ANY USERS OF THE SAMPLE CODE SHALL HAVE NO LICENSE OR RIGHTS IN OR TO (A) ANY OTHER SOURCE OR BINARY CODE, OR ANY OTHER SOFTWARE OR TOOLS, THAT MAKES UP OR IS EMBEDDED IN THE CLIFFORD’S READING ADVENTURES GAME, OR (B) ANY OTHER INTELLECTUAL PROPERTY OF THE COPYRIGHT HOLDER OR ITS AFFILIATES. |
Clifford Artwork © Scholastic Entertainment Inc. CLIFFORD THE BIG RED DOG and associated logos are trademarks of Norman Bridwell. All rights reserved.
For more such windows resources and tools from Intel, please visit the Intel® Developer Zone