# RBO 上板测试

如果你需要上板测试 rbo，需要使用运行时（RBRT）提供的命令行工具 `rbo_executor` 进行性能测试，同时可以使用运行时的 Python 接口来验证 rbo 的结果。

`rbo_executor` 是运行时发布的一个命令行工具，支持测试 rbo 的性能和测试 rbo 的正确性。详细文档可阅读运行时提供的 [rbo_executor 使用文档]([rbo_executor_help_url](https://document.corerain.com/zh/4-%E5%BA%94%E7%94%A8%E5%BC%80%E5%8F%91/%E8%BF%90%E8%A1%8C%E6%97%B6%E8%84%9A%E6%9C%AC%E8%AF%B4%E6%98%8E%E6%89%8B%E5%86%8C))。

- [RBO 上板测试](#rbo-上板测试)
  - [1. rbo\_executor](#1-rbo_executor)
    - [1. rbo\_executor 环境准备](#1-rbo_executor-环境准备)
    - [1.2 rbo 性能测试](#12-rbo-性能测试)
      - [1.2.1 ResNet50 性能测试](#121-resnet50-性能测试)
  - [2. RBRT Python 接口](#2-rbrt-python-接口)
    - [2.1 RBRT Python 接口环境搭建](#21-rbrt-python-接口环境搭建)
    - [2.1 RBRT Python 接口介绍](#21-rbrt-python-接口介绍)

## 1. rbo_executor

为了方便大家快速上手，本文会对 `rbo_executor` 使用进行一个示例介绍，以下操作都在硬件盒子上进行。

### 1. rbo_executor 环境准备

进入运行时提供的容器环境中，先检查 RBRT 服务是否开启，通过命令查看：

```shell
bash-5.2# ps -a | grep rt_api_server
112097 pts/0    01:18:52 rt_api_server
```

若未发现运行时服务，请参考运行时服务开启教程 RBRT 使用文档开启服务。

执行命令测试 rbo_executor 是否可用：

```shell
bash-5.2# rbo_executor --help
usage: rbo_executor --mode=string [options] ...
options:
  -r, --rbo              Path of rbo file. (string [=])
  -m, --mode             Testing mode, ‘acc’ for accuracy test, ‘perf’ for performance test. (string)
  -i, --data-dir         Directory of dataset. Use random data if not provide. Input shape info should be acquired from the rbo file. (string [=])
      --input-number     Input number of model. (int [=1])
  -f, --input-files      Files of dataset. (string [=])
  -o, --output-dir       Directory of file recording inference output. Effective only when ‘data-dir’ is provided. Output will not be recorded if ‘data-dir’ is provided but ‘output-file’ is not. (string [=])
  -d, --device           List of device id for testing, joined by comma ‘,’, e.g. 0,1 for using both AI engine. Default is 0. (string [=0])
  -s, --stream           Number of streams for requesting parallelly, must be greater than 0. Default is 1. (int [=1])
  -b, --batch-size       Number of samples per iteration per stream, must be greater than 0. Default is 1. (int [=1])
  -u, --duration         Time in seconds for testing. Effective when mode is ‘perf’. (int [=0])
  -e, --iteration        Number of batches for testing. (int [=0])
  -g, --graph            The default is rbc model testing. (int [=0])
      --accuracy_eval    Check the result by input name. Default is 0. (int [=0])
      --acc_debug        Output the check result. Default is false. (bool [=0])
      --infer_sync       Infer use sync infer or async infer interface. Default is true. (bool [=1])
      --model_batch      Model's batch. Default is 1. (int [=1])
      --iv               iv files, split by ",". (string [=])
      --src_len          src rbo's length, split by ",". (string [=])
      --input_rbos       rbo list file, line: rbo or rbo:iv:src_len (string [=])
  -h, --help             print help information
  -v, --version          print version information
examples:
  acc example:
   ./rbo_executor --rbo=/root/test_data_and_model/rbo/swin_t.rbo --mode=acc --data-dir=/root/test_data_and_model/data/swin_tiny_1000/ --device=0 --stream=8 --output-dir=/root/example_test/output/
   ./rbo_executor -r /root/test_data_and_model/rbo/swin_t.rbo -m acc -i /root/test_data_and_model/data/swin_tiny_1000/ -d 0 -s 8 -o /root/example_test/output/
  perf example:
   ./rbo_executor -r /root/test_data_and_model/rbo/swin_t.rbo -m perf -d 0,1 -s 8 -b 1 -e 10
bash-5.2#
```

若提示 `command not found`，则需要配置 `rbo_executor` 的环境变量，方式如下：

```shell
bash-5.2# export PATH=/opt/rb/rbrt/api/bin:$PATH
bash-5.2# export LD_LIBRARY_PATH=/opt/rb/rbrt/api/lib:$LD_LIBRARY_PATH
```

可以写入 `rbo_env.sh`，方便下次直接 `source rbo_env.sh` 后使用。

### 1.2 rbo 性能测试

性能测试使用运行时提供的命令行工具 `rbo_executor` 进行测试，示例如下：

```shell
bash-5.2# rbo_executor -r resnet50.rbo -u 20 -m perf -d 0 -s 8
```

- `-r`：指定 rbo 路径。
- `-u`：指定运行 20 s。
- `-m`：指定运行模式，支持 `perf` 性能模式和 `acc` 精度模式。
- `-s`：指定多线程数量。
- `-f`：指定输入文件，一些模型如 `faster_rcnn_r50` 比较特殊，如果输入是随机的会导致模型后半段存在空输入，故这里提供一个有效输入。不指定输入文件时会使用随机输入。
- `-d`: 指定 engine id，如 0, 1。

接下来我们以 resnet50 性能测试为例，展示如何测试 rbo 的性能。

#### 1.2.1 ResNet50 性能测试

这里我们以 CAISA N430P 盒子为例，该产品搭载了 2 个引擎，int8 算力为 16T。

单引擎测试：

```shell
bash-5.2# rbo_executor -r resnet50.rbo -m perf -u 5 -d 0 -s 16
Product: N430P with ID 2.
Product: N430P with ID 2.
==== infered[0]: 89, time cost: 275.55 ms, avg 3.09607 ms ====
==== real time fps: 322.99
++++ total fps[0.275738(s)]: resnet50 322.77 ++++
==== infered[0]: 418, time cost: 1275.84 ms, avg 3.05226 ms ====
==== real time fps: 327.626
++++ total fps[1.27593(s)]: resnet50 327.605 ++++
++++ usage[0]: cur te(0.499692), cur ve(0.009690), max te(0.499692), max ve(0.009690), avg te(0.499692), avg ve(0.009690), cur cpu(0.351396), max cpu(0.351396), avg cpu(0.351396), cur mem(40.465) MB, max mem(40.465) MB ++++
==== infered[0]: 750, time cost: 2276.06 ms, avg 3.03475 ms ====
==== real time fps: 329.517
++++ total fps[2.27614(s)]: resnet50 329.504 ++++
++++ usage[0]: cur te(0.499692), cur ve(0.009690), max te(0.499692), max ve(0.009690), avg te(0.499692), avg ve(0.009690), cur cpu(0.199170), max cpu(0.351396), avg cpu(0.275283), cur mem(40.465) MB, max mem(40.465) MB ++++
==== infered[0]: 1080, time cost: 3276.24 ms, avg 3.03356 ms ====
==== real time fps: 329.646
++++ total fps[3.27633(s)]: resnet50 329.637 ++++
++++ usage[0]: cur te(0.677867), cur ve(0.013138), max te(0.677867), max ve(0.013138), avg te(0.559084), avg ve(0.010839), cur cpu(0.237522), max cpu(0.351396), avg cpu(0.262696), cur mem(40.465) MB, max mem(40.465) MB ++++
==== infered[0]: 1413, time cost: 4276.44 ms, avg 3.02649 ms ====
==== real time fps: 330.415
++++ total fps[4.27653(s)]: resnet50 330.408 ++++
++++ usage[0]: cur te(0.677867), cur ve(0.013138), max te(0.677867), max ve(0.013138), avg te(0.588780), avg ve(0.011414), cur cpu(0.229904), max cpu(0.351396), avg cpu(0.254498), cur mem(5.742) MB, max mem(40.465) MB ++++
++++ total fps[5.27664(s)]: resnet50 338.094 ++++
--------------------------------- rbo list ---------------------------------
resnet50.rbo
--------------------------------- statistics ---------------------------------
device[0]: infered 1784, total 5114.155762 ms, avg 2.866679 ms, max te 0.677867, max ve 0.013138, avg te 0.588780 , avg ve 0.011414 , max cpu 0.351396, avg cpu 0.254498, max mem 40.465 (MB)
fps avg: 348.84, fps max: 330.42, infer avg: 2.87 ms, cpu avg: 25.45 %
```

从日志可知单引擎下 fps 为 348.84，耗时为 2.87 ms，此时使用的算力为 8T。

> 此示例不代表该模型在此设备上的最佳性能，仅供演示。

双引擎测试：

```shell
bash-5.2# rbo_executor -r resnet50.rbo -m perf -u 5 -d 0,1 -s 16
Product: N430P with ID 2.
Product: N430P with ID 2.
==== infered[0]: 82, time cost: 260.46 ms, avg 3.17634 ms ====
==== infered[1]: 83, time cost: 260.598 ms, avg 3.13973 ms ====
==== real time fps: 633.159
++++ total fps[0.260623(s)]: resnet50 633.098 ++++
==== infered[0]: 405, time cost: 1260.8 ms, avg 3.11307 ms ====
==== infered[1]: 406, time cost: 1260.82 ms, avg 3.10548 ms ====
==== real time fps: 643.23
++++ total fps[1.26085(s)]: resnet50 644.012 ++++
++++ usage[0]: cur te(0.047126), cur ve(0.000935), max te(0.047126), max ve(0.000935), avg te(0.047126), avg ve(0.000935), cur cpu(0.718412), max cpu(0.718412), avg cpu(0.718412), cur mem(40.465) MB, max mem(40.465) MB ++++
++++ usage[1]: cur te(0.659074), cur ve(0.012895), max te(0.659074), max ve(0.012895), avg te(0.659074), avg ve(0.012895), cur cpu(0.718412), max cpu(0.718412), avg cpu(0.718412), cur mem(40.465) MB, max mem(40.465) MB ++++
==== infered[0]: 730, time cost: 2261.01 ms, avg 3.09728 ms ====
==== infered[1]: 730, time cost: 2261.03 ms, avg 3.09731 ms ====
==== real time fps: 645.723
++++ total fps[2.26105(s)]: resnet50 645.717 ++++
++++ usage[0]: cur te(0.663773), cur ve(0.012873), max te(0.663773), max ve(0.012873), avg te(0.355449), avg ve(0.006904), cur cpu(0.455927), max cpu(0.718412), avg cpu(0.587169), cur mem(40.465) MB, max mem(40.465) MB ++++
++++ usage[1]: cur te(0.659074), cur ve(0.012895), max te(0.659074), max ve(0.012895), avg te(0.659074), avg ve(0.012895), cur cpu(0.455927), max cpu(0.718412), avg cpu(0.587169), cur mem(40.465) MB, max mem(40.465) MB ++++
==== infered[0]: 1052, time cost: 3261.21 ms, avg 3.10001 ms ====
==== infered[1]: 1057, time cost: 3261.23 ms, avg 3.08537 ms ====
==== real time fps: 646.688
++++ total fps[3.26125(s)]: resnet50 646.684 ++++
++++ usage[0]: cur te(0.666245), cur ve(0.013063), max te(0.666245), max ve(0.013063), avg te(0.459048), avg ve(0.008957), cur cpu(0.479600), max cpu(0.718412), avg cpu(0.551313), cur mem(40.836) MB, max mem(40.836) MB ++++
++++ usage[1]: cur te(0.666258), cur ve(0.012953), max te(0.666258), max ve(0.012953), avg te(0.661469), avg ve(0.012915), cur cpu(0.479600), max cpu(0.718412), avg cpu(0.551313), cur mem(40.836) MB, max mem(40.836) MB ++++
==== infered[0]: 1398, time cost: 4261.42 ms, avg 3.04823 ms ====
==== infered[1]: 1405, time cost: 4261.45 ms, avg 3.03306 ms ====
==== real time fps: 657.757
++++ total fps[4.26148(s)]: resnet50 657.753 ++++
++++ usage[0]: cur te(0.666245), cur ve(0.013063), max te(0.666245), max ve(0.013063), avg te(0.510847), avg ve(0.009983), cur cpu(0.588335), max cpu(0.718412), avg cpu(0.560568), cur mem(5.742) MB, max mem(40.836) MB ++++
++++ usage[1]: cur te(0.666258), cur ve(0.012953), max te(0.666258), max ve(0.012953), avg te(0.662666), avg ve(0.012924), cur cpu(0.588335), max cpu(0.718412), avg cpu(0.560568), cur mem(5.742) MB, max mem(40.836) MB ++++
++++ total fps[5.26158(s)]: resnet50 684.776 ++++
--------------------------------- rbo list ---------------------------------
resnet50.rbo
--------------------------------- statistics ---------------------------------
device[0]: infered 1796, total 5106.937988 ms, avg 2.843507 ms, max te 0.666245, max ve 0.013063, avg te 0.510847 , avg ve 0.009983 , max cpu 0.718412, avg cpu 0.560568, max mem 40.836 (MB)
device[1]: infered 1807, total 5107.835938 ms, avg 2.826694 ms, max te 0.666258, max ve 0.012953, avg te 0.662666 , avg ve 0.012924 , max cpu 0.718412, avg cpu 0.560568, max mem 40.836 (MB)
fps avg: 705.45, fps max: 657.76, infer avg: 1.42 ms, cpu avg: 56.06 %
```

从日志可知双引擎下 ResNet50 batch 1 时最大吞吐率 fps 就是 705.45，此时推理耗时 1.42 ms，使用的算力为 16T。

> 此示例不代表该模型在此设备上的最佳性能，仅供演示。

## 2. RBRT Python 接口

RBRT 同时提供了 Python 接口，方便您可以使用 Python 接口进行 RBO 的运行和测试。

### 2.1 RBRT Python 接口环境搭建

配置 Python 环境变量，并测试是否可用，命令如下：

```shell
$ export PYTHONPATH=/opt/rb/rbrt/api/python/:$PYTHONPATH
$ python -c "import pyrbrt"
Product: N430P with ID 2.
```

RBRT Python 接口无需安装，只需配置 PYTHONPATH 即可，以上日志输出表示 RBRT Python 环境配置成功。

### 2.1 RBRT Python 接口介绍

目前主要的接口如下：

```python
from pyrbrt import PYRunner

runner = PYRunner(rbo_path, device_id)
outputs = runner(x)
outputs= runner(x1, x2, x3) # 多输入
```

其输入和输出均使是 numpy 数组对象。

> 注意，其返回的 outputs 是一个元组对象，如果 rbo 的输出是单个输出时，需要 `outputs = outputs[0]` 这样获取。

该 Python 接口仅供用户进行测试 rbo 的正确性，不建议在生产环境下使用。