FUNSD: Form Understanding in
Noisy Scanned Documents

A dataset for Text Detection, Optical Character Recognition, Spatial Layout Analysis and Form Understanding.

Dataset Overview

A dataset for the document understanding community.

  • 200 fully annotated forms

  • 31485 words

  • 9743 semantic entities

  • 10624 relations

Img

Citation

If you use find this dataset useful, please cite our paper:

G. Jaume, H. K. Ekenel, J. Thiran "FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents," 2019

Bibtex format:

@inproceedings{jaume2019,
    title = {FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents},
    author = {Guillaume Jaume, Hazim Kemal Ekenel, Jean-Philippe Thiran},
    booktitle = {submitted to ICDAR-OST},
    year = {2019}
}

Examples

Word grouping and semantic entity labeling.

Img