Skip to main content

Table 6 Results of iterative optimization process for Example 2

From: Multistep schemes for solving backward stochastic differential equations on GPU

Time (%)

Time (s)

Kernel name

(a) Performance of the main kernels

48.35

8.04

nrm2_kernel

14.94

2.48

sp_inter_non_grid_d_no_for

13.70

2.28

calc_f_and_c_exp_d

6.17

1.03

csrMv_kernel

3.60

0.60

calc_y

3.53

0.89

dot_kernel

1.98

0.33

reduce_1Block_kernel

1.56

0.26

axpby_kernel_val

1.34

0.22

calc_c_exp_d

(b) Performance after first iteration of optimization process

27.88

2.49

sp_inter_non_grid_d_no_for

25.53

2.28

calc_f_and_c_exp_d

11.35

1.01

csrMv_kernel

9.64

0.86

dot_kernel

6.74

0.60

calc_y

5.22

0.47

reduce_1Block_kernel

2.65

0.24

axpby_kernel_val

2.50

0.22

calc_c_exp_d

1.76

0.16

step_3

(c) Performance after second iteration of optimization process

22.23

1.46

calc_f_and_c_exp_d

17.67

1.16

sp_inter_non_grid_d_no_for

15.58

1.02

csrMv_kernel

12.86

0.84

dot_kernel

9.05

0.60

calc_y

7.21

0.47

reduce_1Block_kernel

3.41

0.22

axpby_kernel_val

2.38

0.16

step_3

2.12

0.14

copy_d