README.md 7.05 KB
Newer Older
Martin Schläffer committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
# Reference and optimized C and ASM implementations of Ascon

Ascon is a family of lightweight authenticated encryption schemes with associated data (AEAD), including a hash and extendible output function (XOF).

For more information on Ascon visit: https://ascon.iaik.tugraz.at/

This repository contains the following 5 Ascon algorithms:

- `crypto_aead/ascon128v12`: Ascon-128 v1.2
- `crypto_aead/ascon128av12`: Ascon-128a v1.2
- `crypto_aead/ascon80pqv12`: Ascon-80pq v1.2
- `crypto_hash/asconhashv12`: Ascon-Hash v1.2
- `crypto_hash/asconxofv12`: Ascon-Xof v1.2

and the following implementations:

- `ref`: reference implementation
- `opt64`: 64-bit speed-optimized C implementation
- `opt64_lowsize`: 64-bit size-optimized C implementation
- `neon`: NEON speed-optimized ARM inline assembly implementation
- `bi32`: 32-bit speed-optimized bit-interleaved C implementation
- `bi32_lowsize`: 32-bit size-optimized bit-interleaved C implementation
- `bi32_lowreg`: 32-bit speed-optimized bit-interleaved C implementation (low register usage)
- `bi32_arm`: 32-bit speed-optimized bit-interleaved ARM inline assembly implementation
- `bi16`: 16-bit optimized bit-interleaved C implementation
- `bi8`: 8-bit optimized bit-interleaved C implementation


## Performance results of Ascon-128 on different CPUs in cycles per byte:

| Message Length in Bytes: |    1 |    8 |   16 |   32 |   64 | 1536 | long |
|:-------------------------|-----:|-----:|-----:|-----:|-----:|-----:|-----:|
| AMD Ryzen 7 1700\*       |      |      |      |      | 14.5 |  8.8 |  8.6 |
| Intel Xeon E5-2609 v4\*  |      |      |      |      | 17.3 | 10.8 | 10.5 |
| Cortex-A53 (ARMv8)\*     |      |      |      |      | 18.3 | 11.3 | 11.0 |
| Intel Core i5-6300U      |  367 |   58 |   35 |   23 | 17.6 | 11.9 | 11.4 |
| Intel Core i5-4200U      |  521 |   81 |   49 |   32 | 23.9 | 16.2 | 15.8 |
| Cortex-A15 (ARMv7)\*     |      |      |      |      | 69.8 | 36.2 | 34.6 |
| Cortex-A7 (NEON)         | 2182 |  249 |  148 |   97 | 71.7 | 47.5 | 46.5 |
| Cortex-A7 (ARMv7)        | 1871 |  292 |  175 |  115 | 86.6 | 58.3 | 57.2 |
| ARM1176JZF-S (ARMv6)     | 2189 |  340 |  202 |  133 | 97.9 | 64.4 | 65.3 |

\* Results taken from eBACS: http://bench.cr.yp.to/


## Performance results of Ascon-128a on different CPUs in cycles per byte:

| Message Length in Bytes: |    1 |    8 |   16 |   32 |   64 | 1536 | long |
|:-------------------------|-----:|-----:|-----:|-----:|-----:|-----:|-----:|
| AMD Ryzen 7 1700\*       |      |      |      |      | 12.0 |  6.0 |  5.7 |
| Intel Xeon E5-2609 v4\*  |      |      |      |      | 14.1 |  7.3 |  6.9 |
| Cortex-A53 (ARMv8)\*     |      |      |      |      | 15.1 |  7.6 |  7.3 |
| Intel Core i5-6300U      |  365 |   47 |   31 |   19 | 13.5 |  8.0 |  7.8 |
| Intel Core i5-4200U      |  519 |   67 |   44 |   27 | 18.8 | 11.0 | 10.6 |
| Cortex-A15 (ARMv7)\*     |      |      |      |      | 60.3 | 25.3 | 23.8 |
| Cortex-A7 (NEON)         | 2204 |  226 |  132 |   82 | 55.9 | 31.7 | 30.7 |
| Cortex-A7 (ARMv7)        | 1911 |  255 |  161 |  102 | 71.3 | 42.3 | 41.2 |
| ARM1176JZF-S (ARMv6)     | 2267 |  303 |  191 |  120 | 84.4 | 50.0 | 50.2 |

\* Results taken from eBACS: http://bench.cr.yp.to/


## Implementation interface

All implementations use the interface defined by the ECRYPT Benchmarking of Cryptographic Systems (eBACS):

- https://bench.cr.yp.to/call-aead.html for CRYPTO\_AEAD (Ascon-128, Ascon-128a, Ascon-80pq)
- https://bench.cr.yp.to/call-hash.html for CRYPTO\_HASH (Ascon-Hash) and XOF (Ascon-Xof)


## Manually build and run a single Ascon target:

Build example for CRYPTO\_AEAD algorithms: 

```
gcc -march=native -O3 -DNDEBUG -Icrypto_aead/ascon128v12/opt64 crypto_aead/ascon128v12/opt64/*.c -Itests tests/genkat_aead.c -o genkat
gcc -march=native -O3 -DNDEBUG -Icrypto_aead/ascon128v12/opt64 crypto_aead/ascon128v12/opt64/*.c -DCRYPTO_AEAD -Itests tests/getcycles.c -o getcycles
```

Build example for CRYPTO\_HASH algorithms: 

```
gcc -march=native -O3 -DNDEBUG -Icrypto_hash/asconhashv12/opt64 crypto_hash/asconhashv12/opt64/*.c -Itests tests/genkat_hash.c -o genkat
gcc -march=native -O3 -DNDEBUG -Icrypto_hash/asconhashv12/opt64 crypto_hash/asconhashv12/opt64/*.c -DCRYPTO_HASH -Itests tests/getcycles.c -o getcycles
```

Generate KATs and get CPU cycles:

```
./genkat
./getcycles
```


## Build and test all Ascon v1.2 targets using performance flags:

```
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
ctest
```


## Build and test all Ascon v1.2 targets using NIST flags and sanitizers:

```
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
cmake --build .
ctest
```


## Build and run only specific algorithms, implementations and tests:

Build and test:

```
mkdir build && cd build
cmake .. -DVERSION_LIST="v12" -DALG_LIST="ascon128;asconhash" -DIMPL_LIST="opt64;bi32" -DTEST_LIST="genkat;getcycles"
cmake --build .
ctest -R genkat
```

Get CPU cycles:

```
./getcycles_crypto_aead_ascon128v12_opt64
./getcycles_crypto_aead_ascon128v12_bi32
./getcycles_crypto_hash_asconhashv12_opt64
./getcycles_crypto_hash_asconhashv12_bi32
```


## Hints to get more reliable getcycles results on Intel/AMD CPUs:

* Determine the processor base frequency (also called design frequency):
  - e.g. using the Intel/AMD website
  - or using `lscpu` listed under model name

* Disable turbo boost (this should lock the frequency to the next value
  below the processor base frequency):
  ```
  echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
  ```

* If the above does not work, manually set the frequency using e.g. `cpufreq-set`.

* Determine the actual frequency (under load):
  - e.g. by watching the frequency using `lscpu` or `cpufreq-info`

* Determine the scaling factor between the actual and base frequency:
  - factor = actual frequency / base frequency

* Run the getcycles program using the frequency factor and watch the results:
  ```
  while true; do ./getcycles_crypto_aead_ascon128v12_opt64 $factor; done
  ```


## Hints to activate the performance monitor unit (PMU) on ARM CPUs:

* First try to install `linux-tools` and see if it works.

* On many ARM platforms, the PMU has to be enabled using a kernel module:
  - Source code for Armv6 (32-bit):
    <http://sandsoftwaresound.net/raspberry-pi/raspberry-pi-gen-1/performance-counter-kernel-module/>
  - Source code for Armv7 (32-bit):
    <https://github.com/thoughtpolice/enable_arm_pmu>
  - Source code for Armv8/Aarch64 (64-bit):
    <https://github.com/rdolbeau/enable_arm_pmu>

* Steps to compile the kernel module on the raspberry pi:
  - Find out the kernel version using `uname -a`
  - Download the kernel header files, e.g. `raspberrypi-kernel-header`
  - Download the source code for the Armv6 kernel module
  - Build, install and load the kernel module


## Benchmark Ascon v1.2 using supercop

Download supercop according to the website: http://bench.cr.yp.to/supercop.html

To test only Ascon, just run the following commands:

```
./do-part init
./do-part crypto_aead ascon128v12
./do-part crypto_aead ascon128av12
./do-part crypto_aead ascon80pqv12
./do-part crypto_hash asconhashv12
./do-part crypto_hash asconxofv12
```