# Alpaca

# Introduction

# Functional requirements

The following requirements should be fulfilled:

- Fully featured logmanagement-system 
    - receive logs in ANY logformat
    - Converting ogformats to support SIEM requirements
    - filtering unwanted logs
    - Protecting pipelines from overflooding
- Should fully integrate with Qradar and other SIEMs
- The following functionality should be available 
    - buffering (in case of congestion/network outage/component failures)
    - filtering (should be possible anywhere in the pipeline)
    - logs should be searchable in a database-like datalake
    - logs should be stored to cold storage
    - encryption of data in transit AND data at rest should be supported
    - high availability, system should be able to fully recover from any type of intermittent failure
    - Redundancy: components should be replacable without service-degradation.
    - Solution should be platform-independent (OS/Hardware agnostic)
    - Components must be supported on latest OS/patchlsevels.
    - Components should be in active development/support.
    - platform should support log-transformation to meet Qradar log-standards
    - Each part of the data-pipeline should be auditable/monitorable.
    - Multi tenancy
    - Proven technology

# Design Principles

To limit the possibilities we also decided on some principles the solution should follow:

- No java unless thoroughly tested
- No docker/containers
- No fancy-schmancy python-code.
- Run on Linux
- both X86/ARM support for key components
- Deployable using industry standard deployers (ansible/chef/puppet)

# Basic pipeline architecture

[![alpaca_highlevel_2024.png](https://wiki.gadgetlabs.nl/uploads/images/gallery/2026-02/scaled-1680-/alpaca-highlevel-2024.png)](https://wiki.gadgetlabs.nl/uploads/images/gallery/2026-02/alpaca-highlevel-2024.png)

During each (critical) step data will be written to storage (which will be HA/redundant) to ensure no data will be lost when a critical failure occurs.  
The amount of data in memory will be limited as much as possible

# Base components

During extensive research and experiences from the past, the following software-stacks have been selected as the preferred components to build the new solution.

- Vector ([https://vector.dev](https://vector.dev)) As the core log-management core.
- Kafka as an high-speed buffering solution
- OpenSearch for core data-lake, metrics and dashboarding and datalake alternative
- Ansible (deployer for configuration and setup)

Other packages will be selected depending on need or to handle specific use-cases.

# Download

# Vector

At the core of ALPACA vector is doing most of the heavy lifting.

It can be downloaded at: [www.vector.dev](https://www.vector.dev "Datadog Vector")

# Apache Kafka

Between each major operation data is buffered.

Alpaca uses apache-kafka as a proven/reliable/scalable solution.

It can be downloaded here: [https://kafka.apache.org/](https://kafka.apache.org/)

# Opensearch

opensearch is used for datalake and dashboarding.

It can be downloaded here: [https://opensearch.org/](https://opensearch.org/)

# Installer

The installer is a set of Ansible playbooks

A complete tar-file can be downloaded here and can be placed in a pre-existing ansible-environment.

An installer for an "All-in-one" server (single-node) can be found here: &lt;TODO&gt;

# Installation

# Configuration

# Monitoring

# Integration