Binding to framebuffer 0 may cause a blank screen


Today, I encountered an interesting issue that took away a few hours of my (not so) precious life, to debug and understand. I was implementing some basic shadow mapping, which requires you to create a new render target (meaning you need to render to a separate buffer other than the screen). So, we have to switch back and forth between framebuffers. Most OpenGL tutorials out there will simply ask you to bind back to the default framebuffer 0.

Here’s a code excerpt from learnopengl.com site (as of 11/16/2018) from their article on shadow mapping (I love this site, and this in no way a criticism of their content, just using it to point to a probably bug). https://learnopengl.com/Advanced-Lighting/Shadows/Shadow-Mapping

// 1. first render to depth map
glViewport(0, 0, SHADOW_WIDTH, SHADOW_HEIGHT);
glBindFramebuffer(GL_FRAMEBUFFER, depthMapFBO);
glClear(GL_DEPTH_BUFFER_BIT);
ConfigureShaderAndMatrices();
RenderScene();
glBindFramebuffer(GL_FRAMEBUFFER, 0);
// 2. then render scene as normal with shadow mapping (using depth map)
glViewport(0, 0, SCR_WIDTH, SCR_HEIGHT);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
ConfigureShaderAndMatrices();
glBindTexture(GL_TEXTURE_2D, depthMap);
RenderScene();

Binding back to framebuffer 0, as mentioned here, simply output a blank screen for me. And I went through a period of commenting out all my rendering code and enabling line by line (costing me half a day or more) to see what was going wrong. I found the culprit in the line: glBindFramebuffer(GL_FRAMEBUFFER, 0);. Then I suddenly got an idea and executed these lines to find my actual default framebuffer (after disabling any shadow mapping code and simply doing a simple setup for single pass rendering):


glGetIntegerv(GL_DRAW_FRAMEBUFFER_BINDING, &default_draw_fbo_);
glGetIntegerv(GL_READ_FRAMEBUFFER_BINDING, &default_read_fbo_);

And to my surprise, the answer was 3. With allocating a new framebuffer for shadow mapping, this went up to 4 (weird!). I’m using Qt 5.11 as my application framework, and I’m not sure whether this is a bug/feature or what (maybe they use framebuffer 0 to render their own stuff). But, it seems that the default framebuffer cannot be assumed to be 0.

So, if you’re experiencing a blank screen when trying to a render anything that causes you to switch between framebuffers, make sure you find out what exactly you’re default framebuffer ID is. Then, just switch back to this known number, and all will be well.


// also GL_FRAMEBUFFER is deprecated now, simply use
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, default_draw_fbo_);

 

HTH.

A comparison with and without normal maps

I implemented a normal map into my tiny graphics engine (I’ve gotten by mostly with phong shading for my research work, so yeah I should do more graphics stuff I thought), and also to learn about tangent space. I saw firsthand the remarkable difference a normal map brings.

Map without normal map

without-normal-map

Map with normal map (same lighting as above)

with-normal-map

It’s pretty cool how much more detail a normal map brings into your models. Hope to integrate more features like this.

Loading Shininess in Materials when using Assimp

I was looking around on how to get the shininess (aka the specular coefficient, specular exponent) from a loaded model, when using the Assimp APIs (4.1.0). Unfortunately, nothing in the docs helped me and I had to look into some sample programs to get the answer. So, here’s how to do it:

// material index is the ID of the material you are interested in
	const aiMaterial* ai_material = scene.mMaterials[material_index];
float shininess;
if(AI_SUCCESS != aiGetMaterialFloat(ai_material, AI_MATKEY_SHININESS, &shininess))
    {
// if unsuccessful set a default
shininess = 20.f;
    }

Here’s an example to get some other property like specular color using Assimp’s structs.

aiColor4D spec_color;
 if(AI_SUCCESS != aiGetMaterialColor(ai_material, AI_MATKEY_COLOR_SPECULAR,
        &spec_color))
    {
        spec_color = aiColor4D(1.f, 1.f, 1.f, 1.f);
    }

HTH

Building Caffe on Windows with CUDA 9.0 for VS 2013, 2015 and 2017

Building Caffe on Windows 10 has been a journey (to put it lightly). Since I have a new gen gfx card (new for 2018), it does not support CUDA 8.0. So, all posts saying you need CUDA 8.0 are outdated (at least to me). With CUDA 9.0, Windows caffe does not compile.

The good news after a couple of days of trying I’ve figured out a workaround. The only boost version that supports CUDA 9.0 as of now is boost 1.65.1 and above. But, interestingly cmake breaks with boost 1.66.0. I know, welcome to the real-world versioning hell when it comes to actually building stuff.

So, if you got your windows source from: https://github.com/BVLC/caffe/tree/windows

You need to do a couple of stuff. First download and install boost 1.65.1 in some path. Let’s call this root directory my_boost_1_65_1 (typically C:\local\boost_1_65_1), and the library directory (which changes based upon which VS version you downloaded, typically C:\local\boost_1_65_1\lib64-msvc-14.0 for VS 2015). Yes, it sucks that MSVC version is 14.0 for VS 2015, but such is the life living in a Microsoft world.

I assume you have checked out the 1.0 version of windows caffe. Now, open the build_windows.cmd in the scripts directory, and modify the cmake command as follows (note the 3 lines referring to boost and boost paths):

cmake -G"!CMAKE_GENERATOR!" ^
-DBLAS=Open ^
-DCMAKE_BUILD_TYPE:STRING=%CMAKE_CONFIG% ^
-DBUILD_SHARED_LIBS:BOOL=%CMAKE_BUILD_SHARED_LIBS% ^
-DBUILD_python:BOOL=%BUILD_PYTHON% ^
-DBUILD_python_layer:BOOL=%BUILD_PYTHON_LAYER% ^
-DBUILD_matlab:BOOL=%BUILD_MATLAB% ^
-DCPU_ONLY:BOOL=%CPU_ONLY% ^
-DCOPY_PREREQUISITES:BOOL=1 ^
-DINSTALL_PREREQUISITES:BOOL=1 ^
-DUSE_NCCL:BOOL=!USE_NCCL! ^
-DCUDA_ARCH_NAME:STRING=%CUDA_ARCH_NAME% ^
-DBOOST_NO_BOOST_CMAKE=TRUE ^
-DBOOST_ROOT=C:/boost_1_65_1 ^
-DBOOST_LIBRARYDIR=C:/boost_1_65_1/lib64-msvc-14.0 ^
"%~dp0\.."

I do the following edit as well, so I know that boost version is correct or it will fail. So, edit the boost version in Dependencies.cmake (located in the cmake directory):


# ---[ Boost
find_package(Boost 1.65 REQUIRED COMPONENTS system thread filesystem)
list(APPEND Caffe_INCLUDE_DIRS PUBLIC ${Boost_INCLUDE_DIRS})
list(APPEND Caffe_DEFINITIONS PUBLIC -DBOOST_ALL_NO_LIB)
list(APPEND Caffe_LINKER_LIBS PUBLIC ${Boost_LIBRARIES})

And, by changing these two files, everything should be good. Now, execute scripts\build-windows.cmd and watch your build succeed.


l_prerequisites.vcxproj]
CUSTOMBUILD : warning : cannot resolve item 'api-ms-win-crt-heap-l1-1-0.dll' [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_prerequisites.vcxproj]
CUSTOMBUILD : -- warning : gp_resolved_file_type non-absolute file 'api-ms-win-crt-heap-l1-1-0.dll' returning type 'system' -- possibly incorrect [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_
prerequisites.vcxproj]
CUSTOMBUILD : warning : cannot resolve item 'api-ms-win-crt-stdio-l1-1-0.dll' [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_prerequisites.vcxproj]
CUSTOMBUILD : -- warning : gp_resolved_file_type non-absolute file 'api-ms-win-crt-stdio-l1-1-0.dll' returning type 'system' -- possibly incorrect [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install
_prerequisites.vcxproj]
CUSTOMBUILD : warning : cannot resolve item 'api-ms-win-crt-convert-l1-1-0.dll' [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_prerequisites.vcxproj]
CUSTOMBUILD : -- warning : gp_resolved_file_type non-absolute file 'api-ms-win-crt-convert-l1-1-0.dll' returning type 'system' -- possibly incorrect [D:\tools\caffe\build\tools\upgrade_solver_proto_text_insta
ll_prerequisites.vcxproj]

6897 Warning(s)
0 Error(s)

Time Elapsed 00:14:44.49

Let me know in the comments if you run into more issues.

Cut down video size while maintaining quality

I believe the best way to do this is through ffmpeg (Windows build – http://ffmpeg.zeranoe.com/builds/, others –https://ffmpeg.org/).

Copy your input video (ex: in.mp4 to the ffmpeg bin folder). Open up the command line and navigate to the same bin folder and enter this command. Here out.mp4 is the output video.

ffmpeg -i in.mp4 -crf 20 out.mp4

It works for most videos extremely well. Saved me when my videos were over the size limit for research submissions.

Using the cross product to determine the orientation of edges and points in 2D

The cross product is an extremely valuable tool when doing geometric calculations. One of the many uses of it is it determine whether a point is to the left of an edge or to the right of an edge (also referred to as counter clockwise or clockwise).

cross product

 

Let’s consider the edge OA, and the point B as the edge and the point we would like to determine the orientation of.

To start, let’s connect B to O, to form OB, and then take the cross product OA with respect to OB, i.e. OA x OB. This will always be positive if B is on the left (or CCW) with respect to OA.

350px-Right-hand_grip_rule

Now, if we use the righthand corkscrew rule (which can be used to determine the direction of the vector in Z dimension) and curve our fingers in the direction of OA to OB, i.e. the direction if OA was rotated around O on to OB, we see that our thumb point upwards. Based on the convention that upwards is positive, you can easily see why the cross product works. If we, do the same from OA to OC, you will see that we have to twist our hand and the thumb point downwards. Thus, if the point is on to the right of OA (ex: C), the cross product will always be negative.

One of the uses of this is when your constructing a spatial graph (such as a road network), and you want to determine the order of edges in CW or CCW order around a vertex (or point). Then if you take the cross products between a chosen edge against all the other edges, and sort them in a descending manner according to the cross product value, the edges will be sorted in a CCW order. This happens because the value of the cross product will be highest, i.e. largest positive value at the edge that will be closest to 180 degrees, relative to the chosen edge, and lowest at the edge just adjacent to the largest positive edge, but in the CW order. This property aligns itself nicely, when a sorting of the edges are needed. And since the cross product is a very fast to compute, it will perform fast as well for interactive applications.

Debugging CUDA – Tips and Tricks

CUDA is fast but painful to debug. It’s similar to working with openGL, which gives brilliant results when it works, but you have no idea what’s going on when it doesn’t. I’m listing down a number of ways that you can use to track down issues in your CUDA algorithms. Hopefully, it will ease the pain that I had to go through.

  1. Install Nsight and use CUDA Debugging

This step seems rather obvious, and Nsight gets installed when you install CUDA. But, surprisingly its not obvious to a beginner how to use and why you should use it. If you are using Visual Studio, and are having problems with your CUDA algorithm, follow these steps to start debugging. Make sure the project is built in “Debug” mode. After building it (don’t run it), open the Nsight menu and click CUDA Debugging. And now, you should be able to conveniently place breakpoints within your CUDA kernels, that get hit. Also, look at the Nsight output in your output information, and watch out for error codes.

  1. CUDA Memory checking

Always, make sure for memory access violations. Click on the Nsight menu and make sure “Enable CUDA Memory checker” is checked and follow the steps under point 1 to debug your application. If there are memory access violations stop right there! This is the first thing you should correct. Even if your algorithm runs and you are getting some results, there can be plenty of subtle bugs lying around when memory access violations happen. A common error that happens is because some threads access your arrays outside their index. So you need to block proceeding if a thread index is outside by including a return statement after an index range check like below:

int x_index = blockDim.x * blockIdx.x + threadIdx.x;
int y_index = blockDim.y * blockIdx.y + threadIdx.y;
 
if ((x_index >= cols) 
	|| (y_index >= rows)) {
	return;
}

  1. Understand Nsight debugging output

Make yourself familiar with the CUDA runtime error codes. Nsight will sometimes give output with an error such as “Program hit error 9 on execution”. Now, what you have to do is look up this error code with the documentation that you are using. Let’s look it up here – http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038. Aha! now we know what error 9 means. It says “This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.” We probably asked the kernel to use 100000 threads per block or something to that effect, which is out of the limit of threads that the device can use per block. Now, we know we need to check what the values we are passing and adjust that.

  1. Time your functions

This is something that I found extremely helpful. Here’s a simple C++ snippet I use:

Clock::time_point t0 = Clock::now();
CUDA_segment(pre_segmentation_img, post_segmentation_img, vis_img);
Clock::time_point t1 = Clock::now();
milliseconds ms = std::chrono::duration_cast<milliseconds>(t1 - t0);
std::cout << "Time taken for segmentation: " << ms.count() << "ms\n";

In addition to telling your execution time, which probably matters to you since you are trying to use CUDA, it also tells you if your CUDA execution failed. If you are getting a run time like 1ms for something that would usually take about 500ms, you need to hold your enthusiasm. Your algorithm didn’t suddenly become super fast. Your CUDA code probably ran into an error, and exited.

  1. Use a single thread and a single block and check sequential execution logic

If there is a problem with your algorithm and you need to understand why it’s failing, try simplifying your kernel execution to a single thread. This allow you to forget the complexity of parallel execution and debug it like a single threaded application. Just use block size = 1, and threads per block = 1. Also, do any additional modifications to your kernel code so that it goes on the same path every time you debug, i.e. if your processing an image, make sure it operates on the same sequences of pixels, by hard coding the x and y indices (x_index = 200, y_index = 200).

convert_2_closest_color <<<1, 1>>> (cuda_img, valid_colors_);
  1. Fast debugging – Use printf

After following step 3, I prefer to use a lot of printfs for debugging. This allows me to execute the code in “Release” mode, and see what exactly is going wrong at a fast execution speed.

NOTE: Make sure you disable all printfs through a macro when you want to use this code in production

  1. Write back your output to files and check your output

Even with debugging, the data structures you use are hard to check because of the massive parallelism that’s inherent with CUDA. Try to write out the effects of the intermediate steps of your algorithm by doing a cudaMemCpy from device to host. I usually write out the data into CSV files or image files and check the output for any issues that I can see. If you can visualize the data, you will notice a lot of issues that can result due to errors in your code.

I hope this helped to ease some of the pain that you are suffering due to programming CUDA. Don’t get me wrong I love CUDA, and I truly love the end execution times it gives for my algorithms. But debugging is quite a process and needs to get used to 🙂