Efficient Implementation of Matrix-Matrix Multiplication for Contemporary Multi-Core Processors

Beschreibung der Arbeit:

The goal of this work is to devise an optimized implementation of the (small) matrix-matrix multiplication for contemporary multi-core processors. Matrix-matrix multiplication is the performance-critical component in many deep learning applications. The student is supposed to start with a naive matrix-matrix multiplication implementation in C and iteratively apply optimizations, such as SIMD vectorization and cache blocking, to improve the implementation's performance.