Compliance Documentation (Verfahrensdokumentation)

Last updated: February 2026 - Version 1.0

1. General Information

SystemPaperarchive - Cloud-based document management system
Operator / ResponsibleMarcel Klein
PurposeDigital archiving and management of business documents in accordance with the German GoBD regulations (Grundsätze zur ordnungsmäßigen Führung und Aufbewahrung von Büchern, Aufzeichnungen und Unterlagen in elektronischer Form sowie zum Datenzugriff)
Version1.0
DateFebruary 2026
Legal basisGoBD (BMF letter of 28.11.2019, BStBl I p. 1269), HGB §257, AO §147

1.1 Purpose of this Documentation

This compliance documentation describes the technical and organizational measures that Paperarchive uses for the proper capture, processing, retention, and provision of digitized business documents. It serves as proof of GoBD compliance for tax authorities and auditors.

1.2 Scope

This documentation applies to all documents that are processed and archived within a GoBD-enabled Space in Paperarchive. GoBD compliance can be activated per Space and only applies to Spaces where this feature is enabled.

2. Organizational Framework

2.1 Responsible Persons

RoleResponsibility
System AdministratorOperation, maintenance, and development of the Paperarchive platform
Data Protection OfficerMonitoring compliance with data protection regulations
Space Owner (User)Responsible for correct usage within their own Space, activation of GoBD mode, and ensuring proper document capture

2.2 Access Controls

Access to Paperarchive is protected by a multi-level authorization concept:

  • Authentication: JWT-based authentication via Supabase Auth. Each user receives a signed JSON Web Token after successful login, which is validated with every API request.
  • Authorization: Supabase Row Level Security (RLS) policies ensure that users can only access data assigned to their own account.
  • Space-based separation: Multiple Spaces (e.g. 'Business', 'Personal') can be created within an account. Each Space has its own settings, including GoBD activation.
  • No direct database access: End users have no direct access to the database. All operations go through the API layer with appropriate validation.

2.3 Tenant Separation

Each user works in isolated Spaces. Tenant separation is enforced at the database level through Supabase RLS:

  • Each record is assigned to a user and a Space.
  • RLS policies prevent access to other users' data, even with direct database queries.
  • Space assignment is automatically resolved during document processing - either from upload metadata, the existing document, or the user's default Space.

3. Document Processing

3.1 Processing Pipeline Overview

Document processing follows a defined, five-stage pipeline. Each stage is logged and can be individually retried in case of errors.

Upload → [1] Validation → [2] Malware Scan → [3] OCR → [4] AI Analysis → [5] Embedding & Matching

3.2 Stage 1: Validation

The following checks are performed before processing:

  • File type validation: Only permitted file types are accepted (PDF, JPEG, PNG, TIFF, GIF, WebP, BMP, SVG).
  • File size validation: Maximum file size is 50 MB.
  • Space assignment: The document is assigned to the correct Space (from upload metadata, existing document entry, or default Space).

3.3 Stage 2: Malware Scan

  • Uploaded files are scanned for malware using ClamAV.
  • If malware is detected, the file is immediately deleted from storage and processing is aborted.
  • Only permitted scan commands are accepted to prevent command injection.

3.4 Stage 3: Text Recognition - OCR

  • Text recognition is performed using the Google Cloud Vision API.
  • The API is accessed via the EU endpoint (eu-vision.googleapis.com) to ensure data residency within the EU.
  • Extracted text is split into layout-aware chunks with bounding box information.
  • Maximum page count for OCR: 5 pages per document.

3.5 Stage 4: AI Analysis

The AI analysis extracts structured information from the OCR text:

  • Provider: Azure OpenAI (EU region) as primary provider, OpenAI as fallback.
  • Extracted information: Document title, date, category suggestion, sender recognition, key data (amounts, due dates, reference numbers).
  • Confidence-based processing: Each extracted field receives a confidence score (0.0 - 1.0). Fields with low confidence generate events that must be confirmed or corrected by the user.
  • Pattern Learning: New documents are matched against the user's existing patterns to continuously improve recognition accuracy.

3.6 Stage 5: Embedding and Matching

  • Vector embeddings are generated by a local embedding service (no data transfer to external services).
  • Similar documents are identified based on vector distance to optimize processing.

3.7 Integrity Assurance at Upload

  • SHA-256 Hash: Upon upload, a SHA-256 hash of the original file is computed and stored in the database.
  • Verification: File integrity can be verified at any time by comparing the stored hash with a newly computed hash of the file in storage.
  • Tampering detection: In case of a hash mismatch, a warning is logged. This enables detection of subsequent changes to the original file.

3.8 Processing Step Logging

Each processing run is documented with trace ID, duration per stage, total duration, and any error information.

4. Retention and Immutability

4.1 GoBD Mode

GoBD mode can be activated per Space. When GoBD mode is active, the following extended rules apply:

  • Retention periods are automatically applied.
  • Archived documents are subject to change protection.
  • Deletions during an active retention period are blocked.

4.2 Retention Periods

Retention periods are based on the document type and the applicable legal basis:

Document typeRetention periodLegal basis
Invoices10 yearsHGB §257 Abs. 1 Nr. 1, 4
Accounting vouchers10 yearsHGB §257 Abs. 1 Nr. 1
Annual financial statements10 yearsHGB §257 Abs. 1 Nr. 1
Commercial letters (received/sent)6 yearsAO §147 Abs. 1 Nr. 2, 3
Business letters6 yearsHGB §257 Abs. 1 Nr. 2, 3
Contracts (tax-relevant)10 yearsAO §147 Abs. 1 Nr. 4a
Other tax-relevant documents10 yearsAO §147 Abs. 1 Nr. 5

The retention period begins at the end of the calendar year in which the document was created or received. The calculation is automatic: document date + retention years, with the expiry date set to December 31 of the calculated year.

4.3 Archive Lock (Change Protection)

Documents in GoBD-enabled Spaces are subject to change protection after archiving:

  • Protected fields: After archiving, certain fields (e.g. original file, document date, amounts) can no longer be modified.
  • Archive timestamp: The time of archiving is recorded and cannot be altered.
  • Original file immutability: The original file in storage is secured by the SHA-256 hash. Any change would be detected during an integrity check.

4.4 Deletion Protection

Documents with an active retention period cannot be deleted:

  • Before each deletion, it is checked whether the document belongs to a GoBD-enabled Space.
  • Archived documents cannot be deleted.
  • Documents with a running retention period cannot be deleted.
  • Deletion is only permitted after the retention period has expired.

5. Audit Trail

5.1 Principle

All security- and compliance-relevant changes to documents are logged in an immutable audit log table. The audit trail is active regardless of the GoBD toggle - changes are logged in all Spaces.

5.2 Append-Only Principle

The audit log follows the append-only principle:

  • Entries can only be added (INSERT).
  • UPDATE and DELETE operations are prevented by database triggers.
  • Existing entries cannot be modified or deleted retroactively.

5.3 Logged Actions

The following actions are recorded in the audit log:

  • Document metadata changes (with old and new values)
  • Document deletions
  • Document archival
  • Storage object deletions

5.4 Fault Tolerance

The audit logging is fault-tolerant: errors when writing an audit entry do not cause the main operation to abort. Instead, the error is noted in the application log, so that operational issues do not block business processes.

6. Backup and Recovery

6.1 Database Backups

  • Provider: Supabase (Managed PostgreSQL)
  • Frequency: Daily automatic backups
  • Point-in-Time Recovery (PITR): Recovery to any point in time within the backup window is possible
  • Backup location: EU region (Frankfurt, Germany)

6.2 File Storage

  • Storage: Supabase Storage (S3-compatible)
  • Redundancy: Files are stored redundantly
  • Data residency: EU region (Frankfurt, Germany)

6.3 Recovery Procedure

In case of data loss or corruption:

  1. Point-in-Time Recovery of the database to the desired point in time
  2. Integrity check of recovered documents using the stored SHA-256 hashes
  3. Verification of audit log completeness

7. Data Protection and Security

7.1 Transport Encryption

  • All connections between client and server are TLS-encrypted (HTTPS).
  • Internal communication between backend services and database is also encrypted.

7.2 Access Control

  • Row Level Security (RLS): Supabase RLS policies enforce data isolation at the database level. Every query is automatically restricted to the authenticated user's data.
  • JWT Authentication: Every API request requires a valid, signed JWT token.
  • Rate Limiting: API endpoints are protected by rate limiting to prevent abuse.

7.3 Input Validation

  • All API inputs are validated and sanitized.
  • File uploads are checked for permitted file types and sizes.
  • Document IDs are validated as UUIDs to prevent path traversal attacks.

7.4 Data Minimization with External Services

  • Google Cloud Vision: EU endpoint only. Only image data for text recognition is transmitted - no metadata or user information.
  • Azure OpenAI: EU region. Only the extracted OCR text is transmitted for analysis, not the original file.
  • Local embedding service: Vector embeddings are generated locally on the server. No data is transferred to external services.

8. Technical Infrastructure

8.1 Server Infrastructure

ServerHetzner VM (Location: EU, Germany)
Operating systemLinux (Ubuntu)
RuntimeNode.js
FrameworkExpress.js

8.2 Database and Storage

DatabaseSupabase (PostgreSQL)
File storageSupabase Storage (S3-compatible)
RegionEU region (Frankfurt, Germany)

8.3 External Services

ServicePurposeRegion
Google Cloud VisionOCR / Text recognitioncompliance.infra.euEndpoint
Azure OpenAIAI analysis (GPT-4o)EU
OpenAIFallback for AI analysis-
ClamAVMalware-ScanningLocal (same server)

8.4 Deployment and CI/CD

  • Version control: Git (GitHub)
  • Deployment trigger: Push to the main branch
  • CI/CD-Pipeline: GitHub Actions
  • No manual deployment in regular operations: All changes go through the CI/CD pipeline.

9. Change History

VersionDateChangeResponsible
1.0February 2026Initial versionMarcel Klein

10. Appendix