Banal Because Format Checking is So Trite Geoffrey M. Voelker

advertisement
Banal
Because Format Checking
is So Trite
Geoffrey M. Voelker
University of California, San Diego
Workshop on Organizing Workshops, Conferences, and
Symposia for Computer Systems (WOWCS’08)
This Talk is
Not Very Interesting

Banal is a format checker for PDF documents

Deduces how a document was formatted


Optionally compares it with a specification
Intended for conference management systems


Now being used in HotCRP and EDAS
Seemed timely to document its genesis and implementation
April 15, 2008
WOWCS’08
2
Why?

Preserving reviewer anonymity


Assisting conference management tasks



Ensuring anonymity rules
Possibly helping do initial assignments by mining the bib
Fairness


Acrobat javascript that calls home when pdf is loaded
Everyone else obeyed the rules…
Time


Already enough time spent on reviewing
Frustrated that abuse meant taking even more of my time
April 15, 2008
WOWCS’08
3
How?

Convert PDF

To XML (with pdftohtml)



Track the locations of all segments of text, essentially
form bounding boxes
Compute margins, columns, body font, etc.
Heuristics for page #s, headers, footers, etc.
April 15, 2008
WOWCS’08
4
Where?

A handful of SIGOPS/SIGCOMM conferences



OSDI’06, SIGCOMM’07, SIGCOMM’08
Eddie Kohler has integrated it into HotCRP
Henning Schulzrinne also integrated banal with EDAS

Since 2006, used for over 800 events
April 15, 2008
WOWCS’08
5
So?

What are our community goals for having formatting
requirements?



Evil: Annoying trifles that negatively impact our ability to
communicate our results and ideas?
Helpful: Reflect practicalities of publishing costs and
community time?
Not surprisingly, I’m in the practical camp
April 15, 2008
WOWCS’08
6
Download