From: Multistep schemes for solving backward stochastic differential equations on GPU
Time (%) | Time (s) | Kernel name |
---|---|---|
(a) Performance of the main kernels | ||
48.35 | 8.04 | nrm2_kernel |
14.94 | 2.48 | sp_inter_non_grid_d_no_for |
13.70 | 2.28 | calc_f_and_c_exp_d |
6.17 | 1.03 | csrMv_kernel |
3.60 | 0.60 | calc_y |
3.53 | 0.89 | dot_kernel |
1.98 | 0.33 | reduce_1Block_kernel |
1.56 | 0.26 | axpby_kernel_val |
1.34 | 0.22 | calc_c_exp_d |
(b) Performance after first iteration of optimization process | ||
27.88 | 2.49 | sp_inter_non_grid_d_no_for |
25.53 | 2.28 | calc_f_and_c_exp_d |
11.35 | 1.01 | csrMv_kernel |
9.64 | 0.86 | dot_kernel |
6.74 | 0.60 | calc_y |
5.22 | 0.47 | reduce_1Block_kernel |
2.65 | 0.24 | axpby_kernel_val |
2.50 | 0.22 | calc_c_exp_d |
1.76 | 0.16 | step_3 |
(c) Performance after second iteration of optimization process | ||
22.23 | 1.46 | calc_f_and_c_exp_d |
17.67 | 1.16 | sp_inter_non_grid_d_no_for |
15.58 | 1.02 | csrMv_kernel |
12.86 | 0.84 | dot_kernel |
9.05 | 0.60 | calc_y |
7.21 | 0.47 | reduce_1Block_kernel |
3.41 | 0.22 | axpby_kernel_val |
2.38 | 0.16 | step_3 |
2.12 | 0.14 | copy_d |