Update README.md

This commit is contained in:
daniel.g 2025-09-28 10:01:13 +00:00
parent fa0d76d216
commit f805a51f5d

View File

@ -1,27 +1,61 @@
# Intesa Logs Local Docker Setup (Azure bits left empty)
# Intesa Logs Project Documentation
This repo runs a local pipeline that mimics production **end-to-end**, but **without any active Azure dependencies**.
All “Azure things” are left as **placeholders** so this same repo can later be deployed to Azure.
This repo implements a small, production-style pipeline that inspects bank transfer (“**bonifico**”) logs, looks for anomalies (e.g., **rejected EUR ≥ 10,000**, **`vop_no_match`**, **invalid IBAN/BIC**), and produces a concise report (optionally emailed).
## What runs locally (currently)
It runs **locally via Docker** and is designed to be **deployable to Azure** using the same containers.
1. **Splunk** (container) receives events via HEC.
2. **Poller** (`splunk_poller.py`) queries Splunk and writes newline-delimited JSON **chunks** to a shared volume.
3. **Agent API** (`flask_app.py`) reads chunks and produces a concise compliance/ops report (optionally emails it via Mailtrap).
> Local mode uses `SINK=file` and a shared Docker volume. **No Azure Storage or Queues** are used in this mode.
## What runs on Azure (currently)
1. **Queue-worker**
2. **Agent API**
---
## Quick start (TL;DR)
## High-level flow
**Splunk (HEC)** → **Poller***(Chunks: file or Azure Blob)**(Optional: Azure Queue message)***Analyzer API** → *(Optional: Email via Mailtrap)*
- **Local mode:** Poller writes chunk **files** to a shared volume. Analyzer reads those files directly.
- **Azure mode (final target):** Poller uploads **blobs** to Storage (`bank-logs`) and enqueues messages to Storage Queue (`log-chunks`). A **Queue Worker** consumes queue messages and calls the Analyzer API.
---
## Current state snapshot (whats running now)
### ✅ Running in Azure
- **App Service (Agent API)**
- Name: `tf-in-dev-chatapp-app`
- Image: `tfindevacr.azurecr.io/agent-api:prod` (pulled from ACR via Managed Identity)
- Public endpoint: `https://tf-in-dev-chatapp-app.azurewebsites.net`
- Health: `GET /health``{"status":"ok"}`
- API: `POST /analyze` (see examples below)
- **Azure Container Registry (ACR)**
- Name: `tfindevacr`
- Repos/tags present:
- `agent-api:prod`
- `queue-worker:prod` ✅ *(built & pushed; not yet deployed)*
- **Azure Storage (data plane in use)**
- Storage account: `tfindevst`
- **Blob container:** `bank-logs` (holds `.jsonl` or `.jsonl.gz` chunks)
- **Queue:** `log-chunks` (messages the worker consumes)
> The API is live in Azure. The **worker** and **Splunk** are still local right now.
### ✅ Running locally (Docker Compose)
- **Splunk** container (HEC exposed)
- **Poller** (`splunk_poller.py`)
- You can run it in either:
- `SINK=file` → write chunks to local volume (simple local dev), or
- `SINK=blob+queue` → upload to Azure Blob + enqueue Azure Queue (production-like)
- **Queue Worker** (`worker/`)
- Currently running **locally**, reading Azure Storage Queue and calling the Analyzer (either local API or Azure API based on `ANALYZER_URL`).
---
## Repo structure
```bash
# 1) Create a .env (see sample below)
# 2) Make sure compose.yaml has SINK=file for the poller
# 2) Make sure compose.yaml has SINK=file (if local) or SINK=blob/blob+queue (if azure) for the poller
# 3) Start the stack
docker compose up -d