Metadata-Driven File Ingestion and Quarantine for Metropolis Central Data Inbox (Microsoft Fabric)

 






Company Background
The City of Metropolis launched a city-wide Lakehouse initiative to centralize data from departments such as Police, Transportation, and 311 Services. A shared cloud-based Central Data Inbox was created as a single entry point for incoming files. Due to the lack of automation and standards, the inbox became congested, impacting analytics efficiency and data governance.

Current Situation
All departments delivered files to the same inbox. Analysts manually reviewed files and triggered pipelines, which led to duplicate processing, skipped files, unsupported formats being ignored, and no audit trail. As data volumes increased, analytics and reporting slowed.

Key Challenges
No automated file detection or classification, heavy reliance on manual processes, duplicate ingestion due to missing cleanup, unsupported formats not tracked, poor auditability, and inconsistent downstream data availability.

Objective
Design a single automated, metadata-driven ingestion framework that dynamically scans incoming files, routes them based on format, ingests only valid data into Lakehouse tables, quarantines unsupported files, cleans the inbox after processing, and maintains full traceability.

Solution Architecture
A Microsoft Fabric pipeline dynamically scans the Central Data Inbox, evaluates file metadata at runtime, routes files using conditional logic, ingests validated data into Lakehouse Delta tables, quarantines unsupported formats, and automatically cleans the inbox to prevent reprocessing.

Metadata-Driven Ingestion
A Get Metadata activity retrieves a complete inventory of incoming files at runtime. This removes dependency on file naming conventions and ensures every file is evaluated, enabling scalable and automated ingestion.

Deliverables
Delta tables were created for Police, Parking, and 311 datasets. Valid files were archived in department-specific folders, unsupported files were quarantined for review, and the Central Data Inbox was cleared after processing.

Business Impact
Manual effort was eliminated through full automation, data duplication was prevented, data reliability and governance improved, analytics and reporting accelerated, and complete traceability of file movement was achieved.

Comments

Popular posts from this blog

Global Freight Forwarders Incremental Ingestion of Logistics Data (Microsoft Fabric)