# 20018. Square Root of Vector Elements (SIMD)

## I'm a slow walker, but I never walk backwards.

### Background

Intel 公司在 X86 指令集上提供了 AVX (Advanced Vector Extensions)、SSE (Streaming SIMD Extensions) 的特殊指令，這些在 SIMD 指令上提供強力的加速。

## Problem

### main.c

12345678910111213141516171819202122232425262728#include <stdio.h>#include <memory.h>#include <math.h>#include <time.h>#include "VSQRT.h"  static float a[1048576], b[1048576];int main() {    int N = 1048576;    for (int i = 0; i < N; ++i)        a[i] = i;     for (int it = 0; it < 20; it++)    {        memcpy(b, a, sizeof(a[0])*N);        clock_t t = clock();        for (int i = 0; i < 10; i++)            sqrt2(b, b+N);        t = clock() - t;        fprintf(stderr, "It took me %f seconds.\n", ((float) t)/CLOCKS_PER_SEC);        float sum = 0;        for (int i = 0; i < N; i++)            sum += b[i];        printf("%f\n", sum);    }    return 0;}


### VSQRT.h

123456#ifndef __VSQRT_H#define __VSQRT_H void sqrt2(float *begin, float *end); #endif


### VSQRT.c

12345678#include "VSQRT.h" #include <math.h> void sqrt2(float *begin, float *end) {     for (; begin != end; begin++)         *begin = sqrt(*begin);}


no input

## Sample Output (stdout)

12345678910111213141516171819201051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.1250001051776.125000


## Makefile

all: main main: main.c VSQRT.o        gcc -Wall -std=c99 -O3 main.c VSQRT.o -o main -lm VSQRT.o: VSQRT.c        gcc -Wall -std=c99 -O3 -msse -mavx VSQRT.c -c -o VSQRT.o -lmtest:        time -p ./main