This repository provides optimized implementations of matrix multiplication algorithms in C, leveraging advanced techniques to achieve high performance. We explore methods like blocking, vectorization, and loop reordering to maximize matrix multiplication efficiency.
- Algorithms
- Computer Architecture
- Data Processing
- High-Performance Computing
- Optimization
- Parallel Computing
Row-major order stores data row by row in memory. This technique ensures better spatial locality during row-wise traversals, reducing memory access times and improving overall performance.
Using optimization flags, such as -O3, adjusts how the compiler generates code by applying aggressive optimization techniques. This can significantly improve performance by reducing execution time and better utilizing the available hardware.
Loop reordering adjusts the order of nested loop iterations to optimize memory access patterns. This improves data locality and minimizes cache misses, making computations more efficient.
Vectorization leverages SIMD (Single Instruction, Multiple Data) instructions to perform parallel computations, accelerating processing on supported hardware architectures.
The blocks technique breaks matrices into smaller sub-blocks to optimize memory hierarchy usage. This reduces cache misses and enhances performance by improving memory access patterns.
Recursion involves breaking down a problem into smaller, self-similar subproblems. In the context of matrix multiplication, this entails recursively dividing the matrices into quadrants and multiplying the corresponding quadrants. This approach can lead to significant performance improvements, especially for larger matrices.
src/
: Contains the C source code with optimized algorithm implementations.bin/
: Contains the executables compiled from sources.
To compile the project, use a compiler like gcc
with appropriate optimization flags for SIMD and other techniques:
bin/exe_1
gcc -o bin/exe_1 src/mat_mul_1.c
bin/exe_2
gcc -o bin/exe_2 src/mat_mul_2.c
bin/exe_3
gcc -O3 -o bin/exe_3 src/mat_mul_2.c
bin/exe_4
gcc -O3 -o bin/exe_4 src/mat_mul_3.c
bin/exe_5
gcc -O3 -march=native -ftree-vectorize -o bin/exe_5 src/mat_mul_3.c
bin/exe_6
gcc -O3 -march=native -ftree-vectorize -o bin/exe_6 src/mat_mul_4.c
bin/exe_7
gcc -O3 -march=native -ftree-vectorize -o bin/exe_7 src/mat_mul_5.c