CUDA Core

  Sum   =  Sum + ( a x b) 

   D  =  C  + ( A x B)   

where A, B, C and D are scalar numbers 

TP Core ( Tensor processing core )

D  =  C  + ( A x B)   

where A, B, C and D are  m x n matrices

   1 <=  m  <= 10

   1 <= n <= 10  

Examples in py, C++ and in cu

( nvcc is used and also cmake is used to generate make file. )

Easier Introduction to CUDA cores and runing program on CUDA cores by using C++ code.