Bismita Sahu Summer 2011

From CSclasswiki
Jump to: navigation, search


7th June, 2011 OpenNI: API between Kinect and PC

I started the project by trying to fathom the working of the Kinect sensor for video games. As it turns out, Microsoft is yet to launch a SDK to use it for PC which would enable a mouse-free or touch-pad free experience. However, PrimeSense the company which designs the hardware for Kinect has a open-source natural interface called OpenNI licensed under a GNU license. This natural interface works in three layers: with an application, middleware and sensor at each level. The middleware used here is NITE which is used to extract data(eg. skeleton) from the raw image data sensed by Kinect. Next I tried to scan different applications of Kinect which have been implemented as my goal would be to try different applications of Kinect with the PC,e.g, controlling the start menu with hand gestures. I am trying to finish reading the user manual for OpenNI to have a clear picture about the working of this open source API and build and run a sample application.

8th June, 2011 FAAST: Skeleton Tracking

Today I could run Kinect on the PC. I looked into various applications(ranging from controlling humanoid robots to building virtual houses out of physical bricks going without the need of CAD) but FAAST,a software developed by researchers in USC held my interest. The article published in IEEE can be referred below: E. Suma, B. Lange, A. Rizzo, D. Krum, and M. Bolas, “FAAST: The Flexible Action and Articulated Skeleton Toolkit,” Proceedings of IEEE Virtual Reality, pp. 247-248, 2011 From what I infer the software(which is freely distributed and available for modification) gives the skeletal framework of the user which can be combined with virtual motion. It can thus be extended to be used in video games which originally do not have depth sensing capabilities. One interesting application included an online poker game in kinecthacks which was developed using OpenNI, NITE and FAAST. I would like to use this application on the PC as well as use FAAST for any new application.

9th June,2011 Trying fun applications with FAAST

I used Kinect today to play Pacman on the PC. I also used FAAST for Google map in its street view to move around. Also it was fun to scroll up and down on all webpages I was scanning today by gestures! FAAST works in two modes where it first calibrates according to user and then uses the key binding provided by the user to correspond to various keyboard strokes as well as mouse clicks. This makes it extremely adaptable for any kind of video game having a multitude of keyboard and mouse strokes. I tried it on Pacman since it was the simplest game I could think of. I intend to use FAAST for something more didactic applications tomorrow.

List of helpful stuff and where to find them:

OpenNI [1] Sensor Kinect Drivers [2] NITE[] FAAST [3] Brekel [4] : This is the software I have not yet worked with but it is highly useful as it can capture 3D objects to be used in 3D packages. I was most interested in using this software because it was recently used by a group of researchers in University of Michigan's 3D lab to stream the real-time kinect data(motion and depth) into a motion builder. Please refer to this link for complete details as well as a cool video demonstrating it: [5]

10th June,2011 Retrieving kinect data and deciding on my project

I used Brekel today to track my 3D motion and store the motion and depth data as a .BVH file. I used a viewer to track the joint deformations of my skeletal image at each moment. Next I looked up the source codes of different applications that looked interesting. I could get them for very very limited hacks. Among those two applications captured my special interest:

KINECT APPLICATIONS: (Some of the best sources of kinect related hacks or their source codes are [freenect],[ github], [kinecthacks][,openkinect], [openni],[ i-programmer] and [maximum pc])

• Tracking hand gestures using 1. TUIOKInect

Description: TuioKinect tracks simple hand gestures using the Kinect controller and sends control data based on the TUIO protocol. This allows the rapid creation of gesture enabled applications with any platform or environment that supports TUIO [6] Source code: [7]

2. Hand detection software inspired by Minority Report Description: It uses the Kinect sensor from Microsoft, and the recently released libfreenect driver for interfacing with the Kinect in linux. The graphical interface and the hand detection software were written at MIT to interface with the open source robotics package 'ROS', developed by Willow Garage. The hand detection software showcases the abilities of the Point Cloud Library (PCL), a part of ROS that MIT has been helping to optimize. The hand detection software is able to distinguish hands and fingers in a cloud of more than 60,000 points at 30 frames per second, allowing natural, real time interaction. [8] Source code:


[10] 3. Segmenting boxes by hand-tracking Video:[] Source Code: [11]

• Constructing a 3D scene [12] The software library [MRPT] now implements a common C++ library for the Kinect sensor. The library works both on top of the Linux/Mac OS X driver and the Windows driver. The advantage of this approach is that developers can now write software using one unified API that controls a Kinect sensor on all platforms.

13th June, 2011 Getting Point Cloud Data(PCD) from Kinect

Today my basic objective started with understanding hand-tracking with kinect which I have enumerated before and add some more interesting features to make it more graphical. Also something that I considered a top priority was the need to find some means to convert the kinect data into some useful form so that I could apply it to a wide range of fields. On dwelling on more interesting applications today I was fascinated by this particular hack by Garrant Gallagher :[ ] as described in his own blog: "You put a white piece of paper on your desk. You take a black marker, and draw a shape. It can be anything, as long as encloses an area. You then can press the shape, and it acts like a button. I made the button make a sound, but it could do anything." As I looked into its source code: [13] the basic concept I could grasp is the need to convert the kinect image and depth data to a point cloud. As this point cloud is not directly usable in any 3D application we can convert them to a triangular mesh by using the process of surface reconstruction to render a 3D model( executed rather simply by open source softwares like Meshlab). I am interested in using the Point Cloud Library(PCL) which helps in processing these point clouds and helps removing outliers, segmentation, modelling and reconstruction. As the library is coded in C++ and I am not familiar with the use of pointers which it happens to heavily rely on, I think I need to go through the basics of C++ before trying to figure out the source code.


14th June, 2011 Getting familiar with new concepts in C++:pointers, structs and more

I tried going through the syntax of C++ so that I can understand the source codes provided in the C++ libraries to get the point cloud data. I have got still a lot to cover. I think I will take one more day learning the basics before I proceed with understanding the code and writing some myself.

15th June, 2011 Understanding the visualization source code in Point Cloud Library(PCL)

As I went through the program intended to read from a simple OpenNI viewer, I encountered new concepts like using namespaces, iterator traits and templates. The code looks too complicated just now. I am struck with the use of typedef.

16th June, 2011 Scanning the Simple_Open_Viewer source code in PCL and also thinking of using C libraries in OpenNI and NITE as they have a collection of gestures already embedded in them

The code in PCL typically outputs the average frame-rate, locks the cloud and sets it to null at each frame and calls it back it back again. I could not much understand the technical details so I tried reverting back to OpenNI and NITE and trying to understand the technical details there. This is an useful page that I came across which actually helps me understand how to go about hand-tracking and understand the C libraries in OpenNI: [14] The code is very akin to handling events in Java though the source code in the site yields scores of errors when executed. However I would like to access to access the source file but then I think I have to add it to the Ubuntu Software Center first because it doesn't seem to be an executable file.

17th June, 2011 Looking into sample programs in OpenNI and NITE

Today I reverted back to using the C libraries in OpenNI and NITE because I found the API to be clear enough to understand and the documentation is extremely elaborate and easy to understand. I like the fact it already has gesture recognizing source code as well as features to track a single hand. My basic hurdle here is to be able to program in C which I have no prior experience in though trying to learn C++ since last few days has come particularly helpful because many features are similar. Also I would like to use the motion builder in Brekel to animate the user though I am unable to install Brekel in Ubuntu because every time I try installing any executable file to the Ubuntu software center I am getting the error: Failed to load the package list: This is a serious problem. Try again later. If this problem appears agin, please report an error to the developers.

20th June, 2011 Microsoft Research released the SDK for developers

Today I spent all my time looking into the NITE middleware to extend its inbuilt hand-tracking features. The documentation is very clear and the NITE tree concept is easy to grasp with its stress on event handlers and listeners. I looked into sample XML code and the point objects used to track hands. Also I looked into the work done by various hackers as I wanted to use hand-tracking to create a hand-tracking application for media browsing which would be interactive and would function much like the cursor. As such I found helpful means to keep the movement steady, match the velocity of the movement on the screen to the actual motion of the hands and adjust the height of the view area so that the user is not implored to do exaggerated motions. However by the end of the day I was delighted to find that Microsoft released its developer edition of the SDK 3 days ago. From my brief glance at its documentation I find one part extremely appealing: the presence of an audio API which I have seen in a single hack possibly because the open source SDK released by Primesence earlier did not provide an audio API. However I have seen many hack applications actually incorporated into the SDK. It is now so easy to use the SDK as it offers so many capabilities in one place. I am really eager to explore it tomorrow and see if I can think of some new application by using it.


21st June,2011 Trying out sample apps in Microsoft SDK for kinect

I started the day by learning to use Visual C++ by attending Microsoft's tutorials which is a prerequisite for using its API. Then I tried using two apps provided with the API itself. One happened to be a skeletal viewer: one feature that I have not seen in previous hacks is the ability to track two people at the same time. Further I also tried the speech recognizing app which sounded very exciting because I had never witnessed an audio hack before. But there was a build failure showing a compilation error. I could not detect the error because I do not completely understand the code as yet.

22nd June, 2011 Modifying the skeletal image in SkeletalViewer App

I tried to modify the skeletal image to display a part. I could only manage to hide the entire skeletal framework and make the pens/joints appear.

23rd June, 2011 Modifying the video image

With help from Prof.Howe I could display a part of the skeleton and also play around with it. I had mistakenly commented out the Polyline function which did not enable a part of the skeleton to connect up which caused no changes when the DrawSegment function was called. I tried modifying the video data but it was really difficult to understand the code. I tried going through the programming guide for NUI API and I did get the basic framework that the sample programs follow. I will continue working on it tomorrow and try modifying the video data in some way. I do understand the basic process of retrieving images using the polling/event method but it is difficult to learn to modify it. I presume the only thing that sets me back is the fact that I have never worked with Windows SDK development so I think I should go through the basic guide for SDK developers that Microsoft provides for beginners which would be a good starting point as it is one of the prerequisites assumed for using this SDK.

24th June, 2011 Manipulating the video and skeletal data from kinect using SkeletalViewer App

The first thing I started with today was the need to understand how the different windows for video, depth and skeleton are being handled. The main program SkeletalViewer.cpp essentially deals with handling the main application window. However the whole program only made sense to me after I went to the basics of creating a user interface in windows using c++.The specifics of a windows program and how it is different from a console program in C++ is important to notice because whereas the former has the main() function the latter deals with two functions WndMain and WndProc one to create the window and display it as well as alert the other if a message comes in(in other words the user performs a particular task) and the other function responds by a function call in response to the user's action. Here is a good post on it: [15] that presents clear ideas.

Over the past two weeks I have learned that it is difficult to summarize the end results at the end of the day as I usually miss out some essential things I learned and elaborate the not so important bits. From now on I would try to post the material in small chunks so that I can have the record of everything I worked on the previous day for reference.

The process sensor data in the Nui_ProcessThread can be modified to get access to making changes to the vga view, the skeleton view and the depth image. I omitted the vga view and I think about adding a background color to this window. I still don't seem to have gone further to merge the vga view with the skeletal view. Basically I think I had a major misinterpretation. I think I need to make changes to some parts of the streaming data so that I could notice the changes graphically in the VGA view. I have to work on it next week.

That's not even 10 mintues well spent!

28th June,2011 Understanding GDI objects and getting clues to using AppWizard to incorporate base classes into the program

However today I could spot the problem as my unfamiliarity of using Visual C++ though I did understand the skeletal framework of developing an window based user interface. I am right now skimming this book "Teach Yourself Visual C++ in 24 hours" by Mickey Williams which focuses on using bitmaps and image lists, device contexts and pens. I think this would enable me have a better understanding of the code and equip me with means to modifying the video stream.

Comment from Nick: While it is not necessarily a bad thing to learn about the whole structure of classes and code required to write a standalone app on Windows, you may not need to dig into all of that for this problem. You should feel free to focus just on the video drawing part of the program, unless you feel like the you want to continue exploring more broadly.

29th June -1st July,2011 More digging into the code for demo apps

I spent the three days trying to modify the video stream without much success. I feel windows programming is not very familiar and so many features provided in the SDK is helpful though also a bit mind-boggling.

4th July, 2011 Getting started with Opengl

Opengl is basically used for handling graphics and adding features like texture mapping and animations. I learned the basics which involved programming in the windows environment to create objects, manipulate their colors, perspectives, orientation, shading and manipulate matrices. I think I would more need to understand how to manipulate video rather than giving the subject such a broad treatment.


Your articles are for when it absolutely, positively, needs to be unredtsood overnight.

6th July-8th July,2011 Using CLNUI wrapper for Java

My work these three days has been chiefly understanding how to deal with Java packages and I have spent a lot of time setting the paths properly and finally configuring the path on Eclipse Helios. I am still getting an error when I try installing the cg packages which is not being supported due to the lack of some native binaries. Have to again work on it.

11th July, 2011 Finally getting access to the video, depth and skeletal image: Coding in C#

I could learn the basics of getting the data from the Kinect today by using the Microsoft SDK. It involves three simple steps : 1. Create a Runtime object for the NUI(Natural User Interface) 2. Sign up for the event to get the Skeletal/Depth/Video frame ready. 3. Open the stream so that the event will fire. The article and video that helped to get me started are the following: [16] [17] AND [18] which is a part of coding4fun. After getting access to the data, I tried an app(also provided in the video) along with a couple others like KinectPaint, Minority Report like UI, adding Hulk-like features to the video feed,et cetera which are coded in C# and easy to understand after learning the basics of getting started.

12th July, 2011 Coding4fun: A repository of Kinect related hacks (along with a Kinect toolkit)uisng the official SDK

I still have an issue accessing the depth data though i have no trouble accessing the video feed. I get a "Value does not fall within the expected limit" error. I realize now what goes wrong: the depth data cannot be accessed in the same way as the image as the the image I am getting back is 16 bits and it needs to be converted to 32 bits. I found this page extremely helpful: link title . Hopefully I would get it to work properly tomorrow. Also I found C# easy to use as it is pretty close to Java. I found these tutorials in MSDN really helpful:[ Visual C# for the Java Developer]

So that's the case? Quite a rveelation that is.

Scaling the Video frame from the kinect to make it coincide with the Depth frame

The transformation here depends on the x and y displacements as well as the scaling factor which may be arrived upon by using x' = s*x+ dx

       y' = s*y + dx

where x' and y' are coordinates of the video frame and x and y are the respective coordinates of the depth frame. The scaling factor s gives the magnification factor and dx and dy give the displacements in the x and y directions respectively.

28th July, 2011 Removing Background and More

After using the edges from the depth map to draw on the video image and learning loads about calibrating it accurately I am moving to background removal in the video image to use a more interactive background. I have no trouble tracking the player and coloring but I do have issues in drawing the video image instead of the colored depth image which I get from player index. I use a byte array to store the player data if the method returns a frame then I draw the image data if the byte value ranges from 0 to 255. I don't know where I am going wrong but I am getting no video at all! I am trying hard to fix it.

If I communicated I could thank you eoungh for this, I'd be lying.

Background Removal and using Player Index

I could finally remove the background to display only the players ( which shows the videostream from the kinect). As I use the calibration that I used before for extracting the depth edges and painting the edges of the video stream, it helps me do a neat job with the players as the video and depth stream from the kinect are calibrated.

Painting an area with a video

WPF essentially has two classes which handles multimedia. They are MediaElement and MediaPlayer. MediaElement is an UIControl so it can directly handled by using XAML code(it can be used along with a visual brush to paint a video) whereas MediaPlayer is not an UIControl so it needs on be drawn by using DrawingContext(which I use in my program) or a combination of visual drawing and drawing brush classes. Here is a helpful page on the later: [19]

Displaying an external video in the background of Streaming Player

Considering it will be fun to have some sort of interactive display going on in the background when the player interacts, I have to add an external video in the colored background. This is what I have in mind: 1. Separate the video into its respective image frames. 2. Extract the bitmap from each frame. 3. Save the bitmap to a MemoryStream. 4. Read the Memory Stream into a Byte array.

Working with VideoFrames and Bitmaps

As I started working on getting a bitmap from a videoframe in wpf c# I found the following pages extremely helpful. they show different approaches to do the very same giving me the freedom to take an lucid track: 1. Using Arigma Graphics Mill .NET : This a image processing add-on which enables image processing and gives direct access to video data : [20] 2. A good way to convert the video frame to a bitmap in msdn page: [21] 3. Some more handy links including getting the bitmap image from the MediaElement: [22] [23] [24] [25]

Good Resource to understand how to use the RenderBitmap Class and BmpBitmapDecoder


Kinect Background Removal and External Video Backdrop

I have written about removing the background from the video stream that we get from the kinect by using the edges from the depth stream and use of calibration so that the video and the depth streams coincide. Also with the help of the MediaPlayer class(useful in manipulating multimedia in WPF) we can remove the background color to do something more exciting, for example- it would be fun to simulate a dance floor with all the ostentatious lighting while sitting at home cozily in front of the computer. To do that I use the same approach as mentioned before in displaying an external video in the background of streaming player. I use the MediaPlayer class TimeSpan function to set the position of the video which is updated every time a kinect frame is passed( it is to be noted that the kinect frame rate is 30 fps). The new byte frame can be modified to also access the converted video frames from the external video and is passed to the image control every time a new frame becomes available. In this process it was useful for me to learn more about rotating bitmaps to avoid getting an inverted bitmap. This is a page where I found the approach succinct and easy to follow:[] .Also it was is important to note that the byte frame must be shifted by 58 bytes to get the correct video frame.