Realaization Logo
← Back to previous work

Document Denoising (Vision + Language)

Intelligent noise removal for scanned documents, framed as a detection problem, not a filter. Generalizes across noise types without manual tuning.

We built a system to remove noise and artifacts from scanned documents. The problem was approached as an object‑detection task: instead of simply filtering images, we trained a vision‑language model to identify and segment noise regions so they could be removed. In particular, we leveraged advances in vision–language models and object detection. State‑of‑the‑art vision‑language models are designed for tasks such as optical character recognition (OCR), object detection and segmentation, while cutting‑edge object‑detection systems are optimized by neural‑architecture search and quantization techniques. By combining these approaches, we transformed document denoising into an open‑vocabulary detection problem and trained the model on our own dataset of noisy PDF pages. Once trained, the system automatically detects blemishes and noise patterns and then reconstructs a clean document.

Outcome: The denoised documents are noticeably clearer and more legible, making them suitable for downstream OCR and archiving. The project demonstrated that treating denoising as an object‑detection problem allows the model to generalize to a wide variety of noise types without hard‑coding filters. The techniques are adaptable to other document‑understanding tasks.