Debugging CUDA – Tips and Tricks

CUDA is fast but painful to debug. It’s similar to working with openGL, which gives brilliant results when it works, but you have no idea what’s going on when it doesn’t. I’m listing down a number of ways that you can use to track down issues in your CUDA algorithms. Hopefully, it will ease the pain that I had to go through.

  1. Install Nsight and use CUDA Debugging

This step seems rather obvious, and Nsight gets installed when you install CUDA. But, surprisingly its not obvious to a beginner how to use and why you should use it. If you are using Visual Studio, and are having problems with your CUDA algorithm, follow these steps to start debugging. Make sure the project is built in “Debug” mode. After building it (don’t run it), open the Nsight menu and click CUDA Debugging. And now, you should be able to conveniently place breakpoints within your CUDA kernels, that get hit. Also, look at the Nsight output in your output information, and watch out for error codes.

  1. CUDA Memory checking

Always, make sure for memory access violations. Click on the Nsight menu and make sure “Enable CUDA Memory checker” is checked and follow the steps under point 1 to debug your application. If there are memory access violations stop right there! This is the first thing you should correct. Even if your algorithm runs and you are getting some results, there can be plenty of subtle bugs lying around when memory access violations happen. A common error that happens is because some threads access your arrays outside their index. So you need to block proceeding if a thread index is outside by including a return statement after an index range check like below:

int x_index = blockDim.x * blockIdx.x + threadIdx.x;
int y_index = blockDim.y * blockIdx.y + threadIdx.y;
 
if ((x_index >= cols) 
	|| (y_index >= rows)) {
	return;
}

  1. Understand Nsight debugging output

Make yourself familiar with the CUDA runtime error codes. Nsight will sometimes give output with an error such as “Program hit error 9 on execution”. Now, what you have to do is look up this error code with the documentation that you are using. Let’s look it up here – http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038. Aha! now we know what error 9 means. It says “This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.” We probably asked the kernel to use 100000 threads per block or something to that effect, which is out of the limit of threads that the device can use per block. Now, we know we need to check what the values we are passing and adjust that.

  1. Time your functions

This is something that I found extremely helpful. Here’s a simple C++ snippet I use:

Clock::time_point t0 = Clock::now();
CUDA_segment(pre_segmentation_img, post_segmentation_img, vis_img);
Clock::time_point t1 = Clock::now();
milliseconds ms = std::chrono::duration_cast<milliseconds>(t1 - t0);
std::cout << "Time taken for segmentation: " << ms.count() << "ms\n";

In addition to telling your execution time, which probably matters to you since you are trying to use CUDA, it also tells you if your CUDA execution failed. If you are getting a run time like 1ms for something that would usually take about 500ms, you need to hold your enthusiasm. Your algorithm didn’t suddenly become super fast. Your CUDA code probably ran into an error, and exited.

  1. Use a single thread and a single block and check sequential execution logic

If there is a problem with your algorithm and you need to understand why it’s failing, try simplifying your kernel execution to a single thread. This allow you to forget the complexity of parallel execution and debug it like a single threaded application. Just use block size = 1, and threads per block = 1. Also, do any additional modifications to your kernel code so that it goes on the same path every time you debug, i.e. if your processing an image, make sure it operates on the same sequences of pixels, by hard coding the x and y indices (x_index = 200, y_index = 200).

convert_2_closest_color <<<1, 1>>> (cuda_img, valid_colors_);
  1. Fast debugging – Use printf

After following step 3, I prefer to use a lot of printfs for debugging. This allows me to execute the code in “Release” mode, and see what exactly is going wrong at a fast execution speed.

NOTE: Make sure you disable all printfs through a macro when you want to use this code in production

  1. Write back your output to files and check your output

Even with debugging, the data structures you use are hard to check because of the massive parallelism that’s inherent with CUDA. Try to write out the effects of the intermediate steps of your algorithm by doing a cudaMemCpy from device to host. I usually write out the data into CSV files or image files and check the output for any issues that I can see. If you can visualize the data, you will notice a lot of issues that can result due to errors in your code.

I hope this helped to ease some of the pain that you are suffering due to programming CUDA. Don’t get me wrong I love CUDA, and I truly love the end execution times it gives for my algorithms. But debugging is quite a process and needs to get used to 🙂

Useful commands for debugging OSGI issues and constraints

These are a few commands that I find very useful when faced with OSGI issues.

1. ss / ss <string> (abbr. for short status)

Lists down the state of all installed bundles or if followed by a string, does a wildcard match of that string. Useful for initial investigation of checking whether a bundle is active, installed or not present at all.

osgi> ss bam

Framework is launched.


id State Bundle
143 RESOLVED org.wso2.bam.styles_2.0.0.SNAPSHOT
Master=290
150 ACTIVE org.wso2.carbon.bam.analyzer.stub_4.0.0.SNAPSHOT
151 ACTIVE org.wso2.carbon.bam.core.stub_4.0.0.SNAPSHOT
152 ACTIVE org.wso2.carbon.bam.gadgetgenwizard_4.0.0.SNAPSHOT
153 ACTIVE org.wso2.carbon.bam.gadgetgenwizard.stub_4.0.0.SNAPSHOT
154 ACTIVE org.wso2.carbon.bam.gadgetgenwizard.ui_4.0.0.SNAPSHOT
155 ACTIVE org.wso2.carbon.bam.presentation.stub_4.0.0.SNAPSHOT
156 ACTIVE org.wso2.carbon.bam2.core_4.0.0.SNAPSHOT
157 ACTIVE org.wso2.carbon.bam2.core.ui_4.0.0.SNAPSHOT
158 ACTIVE org.wso2.carbon.bam2.presentation_4.0.0.SNAPSHOT
159 ACTIVE org.wso2.carbon.bam2.receiver_4.0.0.SNAPSHOT
160 ACTIVE org.wso2.carbon.bam2.service_4.0.0.SNAPSHOT


2. b  <id> (abbr. for bundle)

Displays details for the specified bundles. Useful for checking what your bundle exports and imports when narrowing down situations like uses constraints. Also, helps with the OSGI services used.

osgi> b 152
org.wso2.carbon.bam.gadgetgenwizard_4.0.0.SNAPSHOT [152]
Id=152, Status=ACTIVE Data Root=/Users/mackie/source-checkouts/carbon/platform/trunk/products/bam2/modules/distribution/product/target/wso2bam-2.0.0-SNAPSHOT/repository/components/configuration/org.eclipse.osgi/bundles/152/data
No registered services.
Services in use:
{org.wso2.carbon.utils.ConfigurationContextService}={service.id=123}
{org.wso2.carbon.base.api.ServerConfigurationService}={service.id=83}
Exported packages
org.wso2.carbon.bam.gadgetgenwizard.internal; version="4.0.0.SNAPSHOT"[exported]
org.wso2.carbon.bam.gadgetgenwizard.service; version="4.0.0.SNAPSHOT"[exported]
Imported packages
org.apache.commons.logging; version="1.1.1"
org.wso2.carbon.utils; version="4.0.0.SNAPSHOT"
org.wso2.carbon.user.core; version="4.0.0.SNAPSHOT"
org.wso2.carbon.registry.core.session; version="1.0.1"
org.wso2.carbon.registry.core.exceptions; version="1.0.1"
org.wso2.carbon.registry.core; version="1.0.1"
org.wso2.carbon.registry.common.services; version="1.0.1"
org.wso2.carbon.base.api; version="1.0.0"
org.osgi.service.component; version="1.1.0"
org.apache.commons.io; version="2.0.0"
org.apache.axiom.om.impl.jaxp; version="1.2.11.wso2v1"
org.apache.axiom.om.impl.builder; version="1.2.11.wso2v1"
org.apache.axiom.om; version="1.2.11.wso2v1"
javax.xml.transform.stream; version="0.0.0"
javax.xml.transform; version="0.0.0"
javax.xml.stream; version="1.0.1"
javax.xml.namespace; version="0.0.0"
net.sf.saxon; version="9.0.0.x"
No fragment bundles
Named class space
org.wso2.carbon.bam.gadgetgenwizard; bundle-version="4.0.0.SNAPSHOT"[provided]
No required bundles

3. p <package> (abbr. for packages)

Shows the bundles that export and import the specified packages. Extremely useful in debugging most OSGI issues.

osgi> p org.wso2.carbon.utils
org.wso2.carbon.utils; version="4.0.0.SNAPSHOT"
axis2_1.6.1.wso2v5 [19] imports
org.wso2.carbon.analytics.hive_4.0.0.SNAPSHOT [145] imports
org.wso2.carbon.application.deployer_4.0.0.SNAPSHOT [147] imports
org.wso2.carbon.bam.gadgetgenwizard_4.0.0.SNAPSHOT [152] imports
org.wso2.carbon.bam2.core_4.0.0.SNAPSHOT [156] imports
org.wso2.carbon.bam2.receiver_4.0.0.SNAPSHOT [159] imports
org.wso2.carbon.cassandra.dataaccess_4.0.0.SNAPSHOT [163] imports
org.wso2.carbon.cassandra.mgt_4.0.0.SNAPSHOT [164] imports
org.wso2.carbon.cluster.mgt.core_4.0.0.SNAPSHOT [169] imports
org.wso2.carbon.coordination.core_4.0.0.SNAPSHOT [172] imports
org.wso2.carbon.core_4.0.0.SNAPSHOT [173] imports
org.wso2.carbon.core.bootup.validator_4.0.0.SNAPSHOT [174] imports
org.wso2.carbon.core.services_4.0.0.SNAPSHOT [177] imports
org.wso2.carbon.dashboard_4.0.0.SNAPSHOT [178] imports
org.wso2.carbon.dashboard.common_4.0.0.SNAPSHOT [179] imports
org.wso2.carbon.dashboard.dashboardpopulator_4.0.0.SNAPSHOT [180] imports
org.wso2.carbon.dashboard.ui_4.0.0.SNAPSHOT [182] imports
org.wso2.carbon.datasource_4.0.0.SNAPSHOT [183] imports
org.wso2.carbon.event.client_4.0.0.SNAPSHOT [187] imports
org.wso2.carbon.event.common_4.0.0.SNAPSHOT [189] imports
org.wso2.carbon.event.core_4.0.0.SNAPSHOT [190] imports
org.wso2.carbon.event.ws_4.0.0.SNAPSHOT [191] imports
org.wso2.carbon.jaggery.app.mgt_1.0.0.SNAPSHOT [222] imports
org.wso2.carbon.jaggery.app.mgt.ui_1.0.0.SNAPSHOT [224] imports
org.wso2.carbon.jaggery.deployer_1.0.0.SNAPSHOT [226] imports
org.wso2.carbon.logging.service_4.0.0.SNAPSHOT [232] imports
org.wso2.carbon.ndatasource.core_4.0.0.SNAPSHOT [236] imports
org.wso2.carbon.ntask.core_4.0.0.SNAPSHOT [239] imports
org.wso2.carbon.registry.common_4.0.0.SNAPSHOT [246] imports
org.wso2.carbon.registry.core_4.0.0.SNAPSHOT [248] imports
org.wso2.carbon.registry.resource.ui_4.0.0.SNAPSHOT [254] imports
org.wso2.carbon.registry.server_4.0.0.SNAPSHOT [258] imports
org.wso2.carbon.registry.servlet_4.0.0.SNAPSHOT [259] imports
org.wso2.carbon.reporting.template.core_4.0.0.SNAPSHOT [263] imports
org.wso2.carbon.security.mgt_4.0.0.SNAPSHOT [273] imports
org.wso2.carbon.security.mgt.ui_4.0.0.SNAPSHOT [275] imports
org.wso2.carbon.server.admin_4.0.0.SNAPSHOT [276] imports
org.wso2.carbon.server.admin.ui_4.0.0.SNAPSHOT [279] imports
org.wso2.carbon.service.mgt_4.0.0.SNAPSHOT [280] imports
org.wso2.carbon.transport.http_4.0.0.SNAPSHOT [285] imports
org.wso2.carbon.transport.https_4.0.0.SNAPSHOT [286] imports
org.wso2.carbon.transport.mgt_4.0.0.SNAPSHOT [287] imports
org.wso2.carbon.ui_4.0.0.SNAPSHOT [290] imports
org.wso2.carbon.user.core_4.0.0.SNAPSHOT [295] imports
org.wso2.carbon.user.mgt.ui_4.0.0.SNAPSHOT [299] imports
org.wso2.carbon.webapp.mgt_4.0.0.SNAPSHOT [301] imports
org.wso2.carbon.wsdl2form_4.0.0.SNAPSHOT [302] imports

4. diag <bid> (abbr. for diagnose)

Shows any unsatisfied constraints of the bundle.

osgi> diag 159
reference:file:plugins/org.wso2.carbon.bam2.receiver_4.0.0.SNAPSHOT.jar [159]
No unresolved constraints.

 

5. ls (abbr. for list services)

Lists down the state of all OSGI services. In this list the most important would be identifying the unsatisfied components as in component 20 below.

osgi> ls
All Components:
ID State Component Name Located in bundle
1 Registered org.eclipse.equinox.frameworkadmin.equinox org.eclipse.equinox.frameworkadmin.equinox(bid=108)
2 Active org.eclipse.equinox.p2.artifact.repository org.eclipse.equinox.p2.artifact.repository(bid=114)
3 Active org.eclipse.equinox.p2.core.eventbus org.eclipse.equinox.p2.core(bid=116)
4 Active org.eclipse.equinox.p2.di.agentProvider org.eclipse.equinox.p2.core(bid=116)
5 Registered org.eclipse.equinox.p2.director org.eclipse.equinox.p2.director(bid=117)
6 Active org.eclipse.equinox.p2.planner org.eclipse.equinox.p2.director(bid=117)
7 Active org.eclipse.equinox.p2.engine.registry org.eclipse.equinox.p2.engine(bid=120)
8 Active org.eclipse.equinox.p2.engine org.eclipse.equinox.p2.engine(bid=120)
9 Active org.eclipse.equinox.p2.garbagecollector org.eclipse.equinox.p2.garbagecollector(bid=122)
10 Active org.eclipse.equinox.p2.metadata.repository org.eclipse.equinox.p2.metadata.repository(bid=125)
11 Registered org.eclipse.equinox.p2.repository org.eclipse.equinox.p2.repository(bid=128)
12 Registered org.eclipse.equinox.p2.transport.ecf org.eclipse.equinox.p2.transport.ecf(bid=132)
13 Registered org.eclipse.equinox.p2.updatechecker org.eclipse.equinox.p2.updatechecker(bid=133)
14 Registered org.eclipse.equinox.simpleconfigurator.manipulator org.eclipse.equinox.simpleconfigurator.manipulator(bid=138)
15 Active bam.hive.component org.wso2.carbon.analytics.hive(bid=145)
16 Active application.deployer.dscomponent org.wso2.carbon.application.deployer(bid=147)
17 Active gadgetgenwizard.component org.wso2.carbon.bam.gadgetgenwizard(bid=152)
18 Active bam.utils.component org.wso2.carbon.bam2.core(bid=156)
19 Active bam.presentation.component org.wso2.carbon.bam2.presentation(bid=158)
20 Unsatisfied bam.receiver.component org.wso2.carbon.bam2.receiver(bid=159)
21 Active org.wso2.carbon.cassandra.dataaccess.component org.wso2.carbon.cassandra.dataaccess(bid=163)
22 Active org.wso2.carbon.cassandra.mgt.component org.wso2.carbon.cassandra.mgt(bid=164)

6. comp <component id> or ls -c <bundleid>

Lists component specific information regarding OSGI declarative services. Useful for debugging issues with declarative services.

osgi> ls -c 152
Components in bundle org.wso2.carbon.bam.gadgetgenwizard:
ID Component details
17 Component[
name = gadgetgenwizard.component
factory = null
autoenable = true
immediate = true
implementation = org.wso2.carbon.bam.gadgetgenwizard.internal.GadgetGenWizardServiceComponent
state = Unsatisfied
properties = {service.pid=gadgetgenwizard.component}
serviceFactory = false
serviceInterface = null
references = {
Reference[name = config.context.service, interface = org.wso2.carbon.utils.ConfigurationContextService, policy = dynamic, cardinality = 1..1, target = null, bind = setConfigurationContextService, unbind = unsetConfigurationContextService]
Reference[name = server.configuration, interface = org.wso2.carbon.base.api.ServerConfigurationService, policy = dynamic, cardinality = 1..1, target = null, bind = setServerConfiguration, unbind = unsetServerConfiguration]
}
located in bundle = org.wso2.carbon.bam.gadgetgenwizard_4.0.0.SNAPSHOT [152]
]
Dynamic information :
The component is satisfied
All component references are satisfied
Component configurations :
Configuration properties:
service.pid = gadgetgenwizard.component
component.name = gadgetgenwizard.component
component.id = 16
Instances:
org.eclipse.equinox.internal.ds.impl.ComponentInstanceImpl@3fabb84d
Bound References:
String[org.wso2.carbon.utils.ConfigurationContextService]
-> org.wso2.carbon.utils.ConfigurationContextService@22d0e7e3
String[org.wso2.carbon.base.api.ServerConfigurationService]
-> org.wso2.carbon.base.ServerConfiguration@4127f9f0