From f805a51f5d05066b3585910bd1e315d4dcccf473 Mon Sep 17 00:00:00 2001 From: "daniel.g" Date: Sun, 28 Sep 2025 10:01:13 +0000 Subject: [PATCH] Update README.md --- README.md | 66 +++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 50 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index e29174e..d637b34 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,61 @@ -# Intesa Logs – Local Docker Setup (Azure bits left empty) +# Intesa Logs – Project Documentation -This repo runs a local pipeline that mimics production **end-to-end**, but **without any active Azure dependencies**. -All “Azure things” are left as **placeholders** so this same repo can later be deployed to Azure. +This repo implements a small, production-style pipeline that inspects bank transfer (“**bonifico**”) logs, looks for anomalies (e.g., **rejected EUR ≥ 10,000**, **`vop_no_match`**, **invalid IBAN/BIC**), and produces a concise report (optionally emailed). -## What runs locally (currently) +It runs **locally via Docker** and is designed to be **deployable to Azure** using the same containers. -1. **Splunk** (container) – receives events via HEC. -2. **Poller** (`splunk_poller.py`) – queries Splunk and writes newline-delimited JSON **chunks** to a shared volume. -3. **Agent API** (`flask_app.py`) – reads chunks and produces a concise compliance/ops report (optionally emails it via Mailtrap). - -> Local mode uses `SINK=file` and a shared Docker volume. **No Azure Storage or Queues** are used in this mode. - -## What runs on Azure (currently) - -1. **Queue-worker** -2. **Agent API** --- -## Quick start (TL;DR) +## High-level flow + +**Splunk (HEC)** → **Poller** → *(Chunks: file or Azure Blob)* → *(Optional: Azure Queue message)* → **Analyzer API** → *(Optional: Email via Mailtrap)* + +- **Local mode:** Poller writes chunk **files** to a shared volume. Analyzer reads those files directly. +- **Azure mode (final target):** Poller uploads **blobs** to Storage (`bank-logs`) and enqueues messages to Storage Queue (`log-chunks`). A **Queue Worker** consumes queue messages and calls the Analyzer API. + +--- + +## Current state snapshot (what’s running now) + +### ✅ Running in Azure + +- **App Service (Agent API)** + - Name: `tf-in-dev-chatapp-app` + - Image: `tfindevacr.azurecr.io/agent-api:prod` (pulled from ACR via Managed Identity) + - Public endpoint: `https://tf-in-dev-chatapp-app.azurewebsites.net` + - Health: `GET /health` → `{"status":"ok"}` + - API: `POST /analyze` (see examples below) + +- **Azure Container Registry (ACR)** + - Name: `tfindevacr` + - Repos/tags present: + - `agent-api:prod` ✅ + - `queue-worker:prod` ✅ *(built & pushed; not yet deployed)* + +- **Azure Storage (data plane in use)** + - Storage account: `tfindevst` + - **Blob container:** `bank-logs` (holds `.jsonl` or `.jsonl.gz` chunks) + - **Queue:** `log-chunks` (messages the worker consumes) + +> The API is live in Azure. The **worker** and **Splunk** are still local right now. + +### ✅ Running locally (Docker Compose) + +- **Splunk** container (HEC exposed) +- **Poller** (`splunk_poller.py`) + - You can run it in either: + - `SINK=file` → write chunks to local volume (simple local dev), or + - `SINK=blob+queue` → upload to Azure Blob + enqueue Azure Queue (production-like) +- **Queue Worker** (`worker/`) + - Currently running **locally**, reading Azure Storage Queue and calling the Analyzer (either local API or Azure API based on `ANALYZER_URL`). + +--- + +## Repo structure ```bash # 1) Create a .env (see sample below) -# 2) Make sure compose.yaml has SINK=file for the poller +# 2) Make sure compose.yaml has SINK=file (if local) or SINK=blob/blob+queue (if azure) for the poller # 3) Start the stack docker compose up -d