For an easy start simply visit the Quickstart or Quickstart from Hub page.

Installation / Integration into CMake projects

Integrating blace.ai into your project is super simple. You need to:

Download the blace.ai package from github.com/blace-ai/blace-ai/releases and place it in your project.
Include blace.ai cmake configuration in your CMakeLists.txt:
include("../cmake/FindBlace.cmake")
Link your target to blace.ai:
target_link_libraries(<your_target> PRIVATE 3rdparty::BlaceAI)
Add a post-build phase to copy all blace.ai libraries next to your executable:
foreach(DLL_FILE ${BLACE_AI_COPY_LIBS})

add_custom_command(TARGET <your_target> POST_BUILD COMMAND ${CMAKE_COMMAND} -E

copy "${DLL_FILE}" $<TARGET_FILE_DIR:your_target>

)

endforeach()

Important: This setup is the same on all operating systems so you can start developing e.g. on Windows and later deploy from Ubuntu by simply downloading a different (the Ubuntu) version of the package there. The folder structure is always the same. The same goes for coding: The same code works across all operating systems.

Usage

Add

#include "blace_ai.h"

at the top of your source file. This will make all headers available.

blace.ai makes use of a so-call computation graph to execute all commands. This allows for implicit caching of reusable results (like model inferences with exact same arguments) across runs. Therefore usage is split into two phases, graph construction and graph execution.

Model Registering / Execution

All models that are coming from the model hub or our Model Wizard consist of two artifacts: The .h model header and the .bin payload(s).

In order to run the model you need to

Include the header
#include "depth_anything_v2_v8_small_v3_ALL_export_version_v17.h"

This will provide you with
const std::vector<char> depth_anything_v2_v8_small_v3_ALL_export_version_v17 // the encoded model metadata

const std::string depth_anything_v2_v8_small_v3_ALL_export_version_v17_IDENT // model identification string

inline blace::ops::OpP depth_anything_v2_v8_small_v3_ALL_export_version_v17_run(..., int return_index, blace::ml_core::InferenceArgsCollection inference_args, std::string payload_folder) // a method with all arguments exposed (only for models coming from the hub)

blace::ml_core::InferenceArgsCollection
Definition types.h:486
When you later construct a inference operation you pass in the encoded model metadata. This will automatically register the model in the library.
auto infer_op = CONSTRUCT_OP(ops::InferenceOp(

depth_anything_v2_v8_small_v3_ALL_export_version_v17, {interpolated},

infer_args, 0, util::getPathToExe().string()));

The last argument is the path where the .bin model payloads are searched during execution. In this case we expect it next to the executable.

Graph building

First, you construct the computation graph (a DAG). Refer to public_ops.h to see an overview of all available operators (we have a limited set during beta but will roll out the rest of the operators soon).

Such a construction could look like (taken from the Gemma demo project):

auto text_t = CONSTRUCT_OP_GET(blace::ops::FromTextOp(str));
 
auto output_len = CONSTRUCT_OP_GET(blace::ops::FromIntOp(200));
auto temperature = CONSTRUCT_OP_GET(blace::ops::FromFloatOp(0.));
auto top_p = CONSTRUCT_OP_GET(blace::ops::FromFloatOp(0.9));
auto top_k = CONSTRUCT_OP_GET(blace::ops::FromIntOp(50));
 
blace::ml_core::InferenceArgsCollection infer_args;
// get available accelerator (cuda or metal device)
infer_args.inference_args.backend = {blace::ml_core::TORCHSCRIPT_CUDA_FP16};
 
auto infer_op = CONSTRUCT_OP_GET(blace::ops::InferenceOp(gemma_v1_default_v1_ALL_export_version_v10,
            {text_t, output_len, temperature, top_p, top_k}, infer_args, 0));

Note how five input nodes are constructed and fed into the infer_op construction. All relevant model inference arguments are hold by a blace::ml_core::InferenceArgsCollection object. Important: At this point, no model is loaded or executed. We simply define the execution structure.

Inference Backends

In blace.ai a backend is defined as a combination of a framework (like torchscript or onnx), an provider (CPU, CUDA, DirectML etc.) and a precision and has the format <framework>_<provider>_<precision>, e.g. TORCHSCRIPT_CUDA_FP16. All supported backends are defined in blace::ml_core::Backend. When constructing an inference operator you pass a list of blace::ml_core::Backend to the blace::ml_core::ModelInferenceArgs. Upon execution blace.ai will go through the list and run the model on the first backend that is supported by the model and the host. Please see the provided demo for an example.

Graph Execution

Now that we have constructed the graph we can execute it. We do so by constructing a blace::computation_graph::GraphEvaluator from the last node (whichs result we want to get) and running the evaluation:

blace::computation_graph::GraphEvaluator evaluator(infer_op);
auto answer = evaluator.evaluateToString().value();
std::cout << "Answer: " << answer << std::endl;

If the evaluation fails answer will hold a std::nullopt.

Error Handling

Our library will never throw exceptions at you. Instead, all calls to methods through the api wrap the result in a std::optional which will hold a std::nullopt in case of failure. The console will print the error message in this case.

Using models from the hub

Our model hub contains a growing list of compatible models which you can integrate in your application with a few lines of code. Check Quickstart from Hub to learn how to run the provided demo projects.