Style
When writing a document, there are many stylistic choices that are relatively arbitrary. I collect the arbitrary choices that I've made in the course of my writing here both for my own reference and for my students. Many of these are LaTeX-specific.
You'll likely disagree with much of what I say here. In fact, I disagree with some of it; sometimes the choices here reflect compromises done to minimize friction with co-authors or choices made to minimize paper length (e.g., removing ISBNs from references). Please do not point out places in papers where I've deviated from these. Sometimes I have good reasons for deviating: my style is evolving, my co-authors wouldn't accept my conventions, or the journal stipulated something different. I also realize that, as with every document on writing, there will be typos and mistakes here.
Be sure to check out Neil Spring's useful resources (including a script you can use with flymake when editing LaTeX documents). I disagree with the following rules:
- I use the British convention for quote marks (which makes more sense to computer scientists, as below).
- "sufficient" is a fine word (esp. for "sufficient statistics")
- "Teh" is almost never a mispelling when I write it
The rest is great, however, and you should use it (with the above edits). For my students, this is included as part of the Makefile for generating papers. Look at the end of compilation to see the red flags that it has discovered.
The Economist style guide is a good resource for exposing bad habits, and their conventions are reasonable. However, they are a British publication, so ignore British-specific rules (I do have some other points of disagreement, such as infinitives).
Specifically for scientists, Nature has good resources. Specifically for NLP, Nathan Schneider, Philip and Jason have good advice. For machine learning, there's even official NIPS advice. If you're a native Chinese speaker, this is a useful slidedeck
Why LaTeX?
Sometimes I get asked by a new collaborator why I use LaTeX. The bad reason is that it's a convention. It's quite obvious when a paper is not written in LaTeX, and in many scientific fields, it's a sign that you're not really part of the community: papers not written in LaTeX are greeted with more skepticism than papers that are (and often with good reason, non-LaTeX papers usually are worse than LaTeX ones in my reviewing experience; not a rule, but a useful heuristic).
But these are social reasons to use LaTeX and are not particularly convincing for pragmatic reasons. So here are some pragmatic reasons to use LaTeX instead of Word.
- Equations and Math: LaTeX was designed to give clear and
effective renderings of math. Word does a poor job of both making
math look good and allowing you to edit the math later. You can
make equations look good, but it's hard to edit. Word also makes it
very difficult to have inline math, as it often screws up the
spacing between the lines.
The density $\rho$ gives us the resistance $R=\rho L$ of the material.
- Templates: Most venues provide LaTeX templates as their standard templates. These are much easier to use than Word templates, which require far more fiddling.
- You Do Less: In Word, you are tempted to move images and figures around to determine the placement. This is because word gives you that flexibility, and it's hard to ignore the siren call of tweaking. In LaTeX, you provide suggestions on placement and it does the best it can based on pre-programmed rules (e.g., you tell it that it should be at the top or bottom of a page).
- References: One of my biggest annoyances when I am forced to work in Word is how hard it is to easily refer to different chapters, citations, or figures in a way that will be automatically generated. I know there are ways to do it, but it often has huge issues and requires obnoxious plugins. In most cases it is either done manually or requires substantial post-correction. LaTeX does all of these automatically, and it just works.
- Macros: More generally, skilled LaTeX writers develop a set of macros that vastly improves productivity and consistency. These range from typsetting conventions references (e.g., to the name of your approach, institutions, etc.). Because LaTeX makes it easy to import files, you can share references across many documents that you're working on at the same time.
- Collaboration: Because LaTeX is just text files and it is easy to include on file in another, you can use tools like git to work with others as you create your paper.
- Cost: Unlike Word, it's free and open source. LaTeX hasn't changed much in 20 years, so you can still read and compile papers without anything being messed up. Word changes every five minutes.
- Accessibility: When I communicate with blind researchers, they always ask if I have a LaTeX version. They can write and read LaTeX much more easily than Word or PDF (especially when there's math; screen readers always screw up embedded equations in Word).
If you're going to ignore my advice, at least make your Word documents look nice (but it's a lot of work).
If you're new to LaTeX, Typeset.io has a great introduction to how to get started.
Why not settle for "Good Enough"?
Writing scientifically is an art and has always been. While computers make life easier, it doesn't absolve you of the responsibility to clearly and cleanly expressing what you want to say.Writing Clearly and Directly
While writing, it's important to write clearly. There are whole books about this (mentioned above), but it might be useful to go through an example of what this looks like.
Follow-up work can be directed specifically at the game of Diplomacy or can be applied broadly to human-in-the-loop research. This task can be scoped at a general or a player-specific basis. In the general case, the data is trained on all players but a certain one. Then the actions for only that player are predicted. This can be made country-specific---train on data from the majority of users playing as Turkey and test on several held out players. This would ascertain whether linguistic cues or specific game actions are more likely to reveal an impending betrayal. Alternatively, we can use our lie-detection models for a human-in-the-loop approach to identifying deception. The model can be used creating a player interface that provides model insights to players in real-time to make better game decisions. Or the data can be used to create dialog for a Diplomacy bot, although ensuring that the language corresponds to an appropriate scenario is integral.
- scoped at a general or player-specific basis: both "basis" and "scoped" are vague and jargony. It would be more direct to say: "When we look for lies, we can either look for what a lie looks like in general or what a specific person's lies look like."
- Alternatively, we can use our lie-detection models for a human-in-the-loop approach to identifying deception: "Approach" is another vague word, and introducing a new idea should connect to the first point. Better to say: "Instead of general lie detection, you would rather know if you are being lied to in the moment.
- The model can be used creating a player interface that provides model insights to players in real-time to make better game decisions.: While there's no blanket prohibition against the passive, it can sometimes obscure agency. "The model can be used" doesn't explain how the model can help someone. Better to say: "Our models can identify words that find words, phrases, and situations that portend deception and warn potential victims of deception before it's too late."
- Or the data can be used to create dialog for a Diplomacy bot: Needs a transition from other concepts in paragraph but might be better in its own paragraph. Let's focus on transitions a little more.
Transitions
Famously, \abr{ibm} Watson was victorious in a match against the best trivia players in the world, and in this thesis I describe an exhibition event where my system also defeated decorated trivia players. Despite my system defeating good trivia players, it was hollow since I knew that our system was nothing more than well-done pattern matching—i.e., nothing like human reasoning.Let's put aside "victorious" (prefer verbs to adjectives generally). There are three ideas here, but the "and" doesn't add much. The real contrast is between the second two ideas. The final idea obscures what the contrast is between with the vacuous "it", so it's good to repeat "victory" here.
Famously, \abr{ibm} Watson defeated the best trivia players. While the system described in this thesis likewise defeated decorated trivia players, this victory was hollow since our system was nothing more than well-done pattern matching---i.e., nothing like human reasoning.
Tense
- Almost everything should be in the present tense
- This includes related work (even super old stuff like Aristotle)
- This includes generative processes
- It's okay to use the future tense for future work
- Exception for HCI
papers
- findings from past studies are in past tense or, if they still apply today, are in present perfect
- method detail from past studies is in past tense
- our own approach, method, and specific findings are in past tense
- general conclusions of the paper / findings are in present tense because they are still true (rather than completed and in the past) at the time the paper is written (and hopefully read!)
- There's also a rare exception to be made when you need to
distinguish how something was done in the past to how it is used
today.
Retrofitting was originally introduced for refining monolingual word embeddings with a lexical ontology~\citep{faruqui-15}, for \abr{clwe}, we retrofit using the training dictionary $\mathcal{D}$ as the ontology.
Punctuation
I have put the most annoying and frequent issues at the top of the list and made the font size large.
- Put footnotes immediately after sentence / clause
punctuation. Do not put footnotes before final punctuation (common
mistake).
I'm never ever sick at sea.\footnote{Well, hardly ever.}
- Use the fnpct package to make the interaction between footnotes and punctuation less akward.
- There's some belief that you can put a footnote before punctuation if you're following German rules. This actually is not the case (it's more complicated, and usually our footnotes are whole sentences—we use bibliographic citations within sentences):
Fußnoten können ohne Schlusspunkt stehen, wenn sie nur einzelne Wörter umfassen. Man fasst sie aber besser als Auslassungssätze auf und setzt einen Punkt (besonders wenn im selben Text Fußnoten mit ganzen Sätzen und Schlusspunkt vorkommen). (Duden 2016)
- This goes for affiliations as well; put footnote after the comma (this is a case where the German rules would say put footnote before comma, but don't do that).
- Exception: footnotes can come before a dash.
A note number should generally be placed at the end of a sentence or at the end of a clause. The number normally follows a quotation (whether it is run into the text or set as an extract). Relative to other punctuation, the number follows any punctuation mark except for the dash, which it precedes.
- Don't put footnotes on a number (it looks like an exponent).
- Do not use a comma between two verb phrases (no subject) joined by a conjunction.
I washed the car and mowed the lawn.
- Do use a comma between two independent clauses (subject and
verb) joined by a conjunction.
I washed the car, and I mowed the lawn.
- Use an Oxford comma to set off the penultimate element in a list.
- To sum up, whenever you have a comma before an "and" or "but" joining two things, see whether you have both a subject and a verb in what follows. If you don't, you shouldn't have a comma there.
- Use LaTeX hyphens (-), en dashes (--), and em dashes (---) correctly
This is a low-budget trip, and pages 78--101 describe the journey. If the train is on time---which it never is---we'll get there tomorrow.
when you love the em-dash pic.twitter.com/0XO8JfD7sG
— Alexis Gay (@yayalexisgay) 30. November 2021 - Hyphens are needed when you have a phrase with a modifier that
itself becomes a modifier.
Full supervision is better than weakly-supervised training.
- You don't want LaTeX to break lines at some hyphens, so you need to use:
$p$\nobreakdash-adic $n$\nobreakdash-\hspace{0pt}dimensional space
Where the hspace says "don't hyphenate after the 'n', but you can within the word 'dimensional'". - Put punctuation outside quotes (British / Canadian style, not American). This is hard for some people to do, particularly for over-ambituous American students who want to be "right". But doing things the British way is not just clearer but defensible from a linguistic stanpoint, as beautifully argued by Pullum.
Also make sure that your quotes go in the right direction; don't draft in MS
Word and copy paste into a TeX document. If you do, your quotes will be
wrong (TeX uses different symbols for left and right quotes).
In civilized cultures, this is called a ``coaster''.
- Similarly, I use M-x q to word wrap paragraphs in emacs. Like two spaces after a sentence, it looks better in editing but has no impact on layout (more practically, it helps you estimate text length while editing).
I had eggs, toast, and orange juice
Figures
- If a figure has a caption, use the top or bottom
positioning. Center the actual image, and be sure to provide a
label.
\begin{figure}[tb] \begin{center} \includegraphics[width=0.9\linewidth]{2015_hoverboard/awesome} \end{center} \caption{Jetpacks are next.} \label{fig:hoverboards} \end{figure}
- If the figure has no caption, you can use "here".
- Make sure the caption stands on its own. When a reviewer first sees a paper, they will often just skim the figures. Make sure that if a reader does this they will get a sense of what the paper is about and the clear takeaway from each figure.
- Try to use full sentences in the caption.
- Use raster / vector formats as appropriate. In general, photos
should be wavlet formats (jpg), hand-drawn cartoons, logos, or images with
alpha channels should be low-pallete rasters (png), and everything else
should be a vector format (pdf/eps). This is another way that LaTeX is
far superior to Word, where it's much harder to insert graphics without
vector formats being lost. (Omnigraffle and ggplot2 by default create
vector pdfs; another reason to use them!)
- Don't supply file extensions; pdflatex/latex will choose for you (makes it easy to switch between the two).
- Don't use vertical
rules in your tables. It's fine to use shading (light gray)
when you have text (like lists of topic words) that makes it hard to see when
one row starts and another ends. Check out a full guide to making tables.
\rowcolors{2}{gray!25}{white} \begin{tabular}{cp{10cm}} \hline \rowcolor{gray!50} \hline Topic & Terms \\ \hline \hline 1 & Lady Leicester Scrooge Dedlock Rouncewell ladyship Wold Chesney Ghost Volumnia Christmas Tulkinghorn family Spirit Baronet nephew Rosa Scrooge's housekeeper Lady's \\ 2 & Richard Jarndyce guardian Ada Charley Caddy dear Skimpole Miss Summerson Esther Jellyby miss Vholes Kenge Woodcourt quite myself Guppy Chancery \\ 3 & says George Bucket Snagsby Guppy returns Smallweed Bagnet comes Tulkinghorn looks takes trooper does makes friend goes asks cries Chadband \\ 4 & Oliver replied Bumble Sikes Jew Fagin boy girl Rose Brownlow dear gentleman Monks Noah doctor Giles Dodger lady Nancy Bill \\ 5& Nicholas Nickleby Ralph Kate Newman replied Tim Mulberry Mantalini Creevy brother N oggs Madame Gride Linkinwater Smike Arthur rejoined Wititterly Ned \\ 6 & Pickwick Winkle replied Tupman Wardle gentleman Snodgrass Pickwick's Perker fat boy Bardell dear Jingle inquired Fogg Dodson friends friend lady \\ 7 & Lorry Defarge Doctor Manette Pross Carton Darnay Madame Lucie Monseigneur Cruncher Jerry Stryver prisoner Charles Monsieur Tellson's Marquis father Paris \\ 8 & coach uncle gentleman lady box coachman gentlemen landlord get London guard inside horses waiter boys mail passengers large better hat \\ 9 & street door streets windows houses room window few iron walls wall rooms dark within shop doors corner small stood large \\ 10 & money letter paper business read pounds papers five hundred office thousand clerk paid years pen next law desk letters week \\ \hline \end{tabular}
- When you refer to a figure, don't use "see". Just put the reference.
- Use ggplot2 or plotnine for graphics, and make sure notation is consistent with LaTeX. Store your data in the same repository as your paper and have a Makefile command to regenerate figures from the original data. It's important to make the data that you use to generate the plots accessible so that others can run statistical tests and fairly compare to your results if they're so inclined. Keynote/Excel are not a good options for this because not everyone can afford these programs, and they store data in a binary format. These do not age well as versions / operating systems change voer time. ASCII ages much better. While Seaborn or other plotting packages would meet my technical needs, I also like that our group has a consistent pipeline and "look". This means that students can share best practices and help each other make good plots (this doesn't happen if everyone is using a different plotting software).
The FancyPants algorithm can find fancy pants (Figure~\ref{fig:pants}).
Math
- Make sure that all equations and math are nouns and are embedded in a sentence with appropriate punctuation.
The voltage is $V=IR$, where $I$ is the current.
- When in text, spell out all natural numbers less than or equal to one
hundred. Also spell out "infinity" unless it's in an equation or an
algorithm. Anything else should be represented in decimal unless prefixed
by the base (e.g., "0b00010"). In general, follow the Chicago style for
numbers.
- Spell out big round numbers like thousand, billion, etc.
-
Over the entire test set of 200 examples, with full recall ($r=1$), the explanation accuracy is $80.4\%$; when the recall fails ($r=0$)
- Never start a sentence with a number represented with numerals.
- It's okay to break these rules to ensure consistency in the same sentence or paragraph. E.g., "The final score was 315 to 10".
- It's okay to use numerals for Likert scales: "5-point scale".
- In math mode, make sure to write numbers so there isn't
an extra space after the comma (there's a package, siunits, that
does this more generally, but that's overkill):
$11{,}568$
- Do not use more significant digits than you need to (or should!).
- Do not use phrases like "The entropy is defined as follows" when introducing an equation. "The entropy is" or "we define entropy as" work better.
- Remind readers what variables are if you haven't used them in a while.
- When creating a multiline equation, use the align
environment, and keep the alignment to the right of the
equals sign.
\begin{align} 4 = & 2 + 2 \\< = & 1 + 1 \\ & + 1 + 1 \\ = & 4 \end{align}
- If you use
\MoveEqLeft
you're probably using align incorrectly. - Refer to equations as parenthetical appositives or nouns:
The gradient (Equation~\ref{eq:grad}) is further simplified in Equation~\ref{eq:simp-grad}.
- Create macros for the following and use them consistently
(or ask me for a preamble that defines these things)
- Underbraces for explaining complicated mathematical concepts
\newcommand{\explain}[2]{\underbrace{#2}_{\mbox{\footnotesize{#1}}}}
- Gamma function with balanced parentheses
\newcommand{\G}[1]{\Gamma \left( \textstyle #1 \right)}
- Log gamma function with balanced parentheses
\newcommand{\LG}[1]{\log \Gamma \left( \textstyle #1 \right)}
- Digamma function with balanced parentheses
\newcommand{\digambig}[1]{\Psi \left( #1 \right) }
- Digamma function with balanced parentheses but with squished fractions
\newcommand{\digam}[1]{\Psi \left( \textstyle #1 \right) }
- Expectations
\newcommand{\e}[2]{\mathbb{E}_{#1}\left[ #2 \right] }
- Entropies
\newcommand{\h}[2]{\mathbb{H}_{#1}\left[ #2 \right] }
- Indicator variables
\newcommand{\ind}[1]{\mathds{1}\left[ #1 \right] }
- Exponential
\newcommand{\ex}[1]{\mbox{exp}\left\{ #1\right\} }
- Partial derivatives
\newcommand{\D}[2]{\frac{\partial #1}{\partial #2}}
- ELBO (for variational inference)
\newcommand{\elbo}{\mathcal{L}}
- Underbraces for explaining complicated mathematical concepts
- Do not use italics (the default in math mode) for distributions; it helps to define a macro
\theta \sim \mbox{Dir}(\alpha)
- Don't use a pipe (alone) for given; make sure there's space around it
p(a\, | \,b) = \frac{ \sim p(b\, | \,a) p(a)}{p(b)}
- Don't use default parens for big math items; use automatically
resized parens
I = \left( \frac{V}{R} \right)
- Don't start sentences with a math symbol.
- Obey Mermin's rules.
- Number all displayed equations
- Have a name for every equation and a name for every symbol
- Punctuate every equation as prose, treating it as a noun
- Vector/matrix convention (stolen from Yoav Goldberg):
- Use bold upper case Latin letters to represent matrices and bold lower-case Latin letters to represent vectors.
- Use a semicolon to denote vector concatenation.
- Use upper case caligraphic Latin letters for sets.
- Use upper case un-bolded Latin letters for scalars.
- Use lower case Greek letters for latent variables (bolded or not depending on whether it's a vector).
- Use subscripts to represent entries from those vectors and superscripts to identify different flavors (e.g., different times or layers). In the rare case that you exponentiate vectors or matrices, put them in parentheses first.
- Use an empty circle for a Hadamard product.
A \circ B
References within a document
- Prefix the label of a reference with the type of reference it is (use the first three letters of the type of thing it is: fig, tab, sec, etc.).
- Keep references short and do not include spaces in the label (use
hyphens if you must have more than one word, but this should be a
rarity)
\label{sec:romulan-treaty}
- When using an inline reference, make sure there's a tilde between the noun and the number. This makes sure that this doesn't get split between lines (non-breaking space).
\section{Introduction} \label{sec:intro}
Long cat (Section~\ref{sec:intro}) is long.
Stupid rules you should follow
Even though they're not really true, people get twitchy when you abuse them. So learn them as rules for professional writing (even though they're hogwash).
- Use "fewer" when comparing discrete amounts, and use "less" when
comparing continuous amounts.
To weigh less, he should eat fewer donuts.
- Use "that" for restrictive clauses (no commas). Use "which" for non-restrictive
clauses (use commas).
The pizza, which everyone loves, is from Conte's. The pizza that fell on the floor is no longer edible, however.
- Use "who" only with the nominative case (and
"whom" in other cases).
- who can use this (subject of "can use")
- for whom we designed this (object of preposition)
- who gave a lecture (subject of verb "gave")
- Caltech tenured whom? (object of verb "tenured")
- Only use "data" as a plural noun (with one exception).
- Likewise, only use "media" as a plural noun.
- Trickier is "criteria" (plural) and "criterion" (singular): "Some good criteria are ..." vs. "A good criterion is ... ". Avoid sounding pompus but still be technically correct (the best kind of correct) by using "criteria" as a plural noun.
Anti-rules
Some people don't think you should do the following, but I disagree. Feel free to completely ignore these non-rules.
- There's no blanket prohibition against the passive, but don't use it to obscure agency.
- It's fine to end your sentence with a preposition if it sounds natural.
- It's fine to boldly split an infinitive.
- It's fine (and often better) to use "they" as a singular gender-neutral pronoun.
Referencing Related Work
- When you cite something, make sure the reader has enough context to know why you cited it.
We analyze the messages through persuasion tactics, which have been used in cybersecurity~\cite{oliveira-17}.
This is accurate enough, but "cybersecurity" is so broad. Is this even looking at text? Make it clear that you read the paper:We analyze messages using persuasion tactics~\cite{cialdini-04}, which \newcite{oliveira2017dissecting} apply to detect spear phishing e-mails and \newcite{anand-11} apply to persuasive blogs.
Citations
Below, I suggest removing valuable metadata. This is for two reasons: space and uniformity. Unfortunately, the bibliography often counts against page length, so we need to keep our bibligraphies as short as possible while keeping them useful. Because we often have to strip bibtex entries, it helps to have everything be compact to preserve uniformity and having to do last minute massaging. Just make everything obey these rules when you put it into your bibtex for the first time.
- Use bibtex and use it correctly
- Do not abbreviate names (i.e., use "Thomas Stearns Elliot", not "T. S. Elliot") in your files. Bibtex will will do this for you, and you will be out of luck if your citation file uses full names (like ACL).
- Make sure that you are using the correct citation type for
articles, books, chapters, etc.
- For example, if you are citing an entry from The Encyclopedia of Artificial Intelligence, you don't cite the whole book. You'd cite an individual chapter (e.g., the chapter on "Question Answering" by Bonnie Webber, pages 814-822). The editor (Stuart Shapiro) is not the author and shouldn't appear in the text.
- Theses have their own bibtex type.
- Put proper nouns (e.g., "Dirichlet", "Bernoulli", and
"Pitman-Yor" are all named after people) and model/dataset names
(e.g., QuAC, BERT) in your bibtex file in brackets to make sure
they get capitalized correctly.
title = {Latent {D}irichlet Allocation}, title = {A qualitative comparison of {CoQA}, {SQuAD 2.0}, and {QuAC}},
- Do not put the entire title in brackets (will bite you later).
- Make sure all raw titles have consistent capitalization (something that looks good).
- Define journal and conference names in a different bib file from the
one you store your references in. This way you can swap out long form title
(Proceedings of the Association for Computational Linguistics) with the
short form title (ACL) if you're tight for space. Plus, it makes sure your
citations are consistent.
@String{cl="Computational Linguistics"} @String{chi="International Conference on Human Factors in Computing Systems"} @String{cvpr="Computer Vision and Pattern Recognition"} @String{coling="Proceedings of International Conference on Computational Linguistics"} @String{colt="Proceedings of Conference on Learning Theory"} @String{conll="Conference on Computational Natural Language Learning"} @String{eacl="Proceedings of the European Chapter of the Association for Computational Linguistics"} @String{ecml="Proceedings of European Conference of Machine Learning"} @String{emnlp="Proceedings of Emperical Methods in Natural Language Processing"}
- Put citations inside the relevant sentence but outside quotes.
The rug really ``tied the room together''~\cite{dude-98}.
- Citations should be attached to the end of the most relevant noun
phrase. Don't put everything at the end of the sentence (common mistake).
Applications of topic models~\cite{boyd-graber-07} include predicting ideal points~\cite{gerrish-10} and collaborative filtering~\cite{wang-12}.
- Do not include page numbers, publishers, or locations for conferences (nobody reads the paper versions anyway).
- Do not include volumes for NeurIPS. People just use the year.
- Do not include DOIs or ISBNs
- For numeric citation styles (e.g., NeurIPS), don't use inline citations as nouns; use the name.
Moseby~\cite{moseby-11} shows that he can pull off red boots.
- For natbib-like citations (e.g. ACL), use citet/newcite to cite as a noun. Don't use a non-breaking space in this case.
\citet{moseby-11} shows that that he can pull off red boots.
- Don't include material that's only title, author, and year. That's not enough information to find it. Make sure you include either a web address (ArXiV), a university (theses), or a venue (most things).
- When introducing a paper with, for example, an acronym you'll use
throughout the paper, put the acronym in the citation brackets.
We compare word-level represenations~\cite[Word2Vec]{mikolov-13} against document-level representations~\cite[\abr{lda}]{blei-03}.
- Many ArXiv citations actually shouldn't be. If it has appear in a real venue (or will), give that citation. Often this changes between when you submit a paper and when it's accepted. Double check all of your ArXiv citations.
- Don't cite both CoRR and ArXiv. Pick one and stick with it (ArXiv would be my suggestion).
- Don't do a citation dump with more than two cites together. They can't all be identical. Either cut or break apart so people know why you're citing each paper.
- Be specific when you use a citation so that the reader knows
why you're using a citation (it also proves you've read and
understood the paper). So don't do:
Thus, here we analyze the game through persuasion tactics~\cite{cialdini2004social} (\textit{italicized} this section), which has been done in cybersecurity research~\cite{oliveira2017dissecting}.
This doesn't give enough information about what the paper did. Something with cybersecurity and Cialdini. It's not even clear that this is about text (important in an NLP paper). Instead, write with specifics:We analyze this case study using rhetorical tactics~\cite{cialdini-04}, which \newcite{oliveira2017dissecting} use to dissect spear phishing e-mails and \newcite{anand-11} apply to persuasive blogs.
- Use my preamble to turn missing references into giant red
boxes so you don't miss them while drafting.
\newcommand*{\missingreference}{{\Huge \colorbox{red}{?reference?}}} \newcommand*{\missingcitation}{{\Huge \colorbox{red}{?citation?}}} \makeatletter \def\@setref#1#2#3{% \ifx#1\relax \protect\G@refundefinedtrue \nfss@text{\reset@font\missingreference}% \@latex@warning{Reference `#3' on page \thepage \space undefined}% \else \expandafter#2#1\null \fi} \def\@citex[#1]#2{\leavevmode \let\@citea\@empty \@cite{\@for\@citeb:=#2\do {\@citea\def\@citea{,\penalty\@m\ }% \edef\@citeb{\expandafter\@firstofone\@citeb\@empty}% \if@filesw\immediate\write\@auxout{\string\citation{\@citeb}}\fi \@ifundefined{b@\@citeb}{\hbox{\reset@font\missingcitation}% \G@refundefinedtrue \@latex@warning {Citation `\@citeb' on page \thepage \space undefined}}% {\@cite@ofmt{\csname b@\@citeb\endcsname}}}}{#1}} \makeatother
- John Owens has more advice, all of which I agree with except for including page numbers. I don't include page numbers for conference cites because nobody actually reads paper versions any more and it can sometimes push you over the limit when references count toward lengths (I'm looking at you, AAAI).
Typography
- Learn how to use non-breaking spaces
correctly. These replace normal whitespace between pieces of text
that would be confusing or would look bad if they started a line.
In all of these cases, if a line started with whatever is after
the tilde (~), it would be confusing.
- Some names
Pope Innocent~III was so bad at reading maps he confused Byzantine with Jerusalem.
- Sections in papers
Section~\ref{sec:dandolo} argues that Enrico Dandolo should stay quiet on road trips and not claim that he knows better than the \abr{gps}.
- Years
The 1204~\abr{ad} partition treaty shows what can go wrong when you think there's an In and Out at a rest stop when there is only a Red Robin.
- Units
Alexious~V was offered 1~kg gold glasses by his father-in-law.
- Do not also put a normal space on either side of the tilde: that completely removes the effect.
- Some names
- For topic modeling papers, put mentions of words in quotes
and topics (e.g., from a topic model) in underline (topic macro).
We wanted to understand why the \underline{research} topic has high probability for ``procrastination'', but we leave that for future work.
- For question answering papers, put question words in quotes
and answers in underline.
If you see ``phosphonium ylide'', the system always responds \underline{Wittig Reaction}.
- If you are writing a question answering paper that uses topic models, this bullet point will have to be updated.
- Use small caps for acronyms (it looks better; I create a macro called
"abr" to make this easier), and don't put periods after the letters in
acronyms (the reason why: LaTeX uses smart spacing after periods, but
doesn't handle sentence-final acronyms; this will make your life easier).
\textsc{tla}s are popular in the \textsc{us} government.
The interpolation factor~$\lambda$ is inspired by \abr{searn}~\cite{daume-06}.
- If you have an abbreviation mid-sentence, make sure LaTeX knows not to
use sentence final spacing.
Crosby \textit{et al}.\ have a very, very, very fine house.
- Because I learned to type on monotype fonts, I will put two spaces after a period that ends a sentence. There's no good reason to do this, but LaTeX knows to do the right thing in typesetting. I continue to do it, though, because I think it looks better when editing in emacs with fixed-width fonts. (You don't have to do it, but if you see me doing it, that's why; you don't have to worry about it / correct it.)
- Put features (for supervised learning) in monospaced fonts
(feat macro).
The number of words in a document (e.g., \texttt{doclen:13}) is the final feature.
- If it's allowed (ICML 2013 didn't, for instance), use the "times" package for serif fonts.
- For ordinals, write them out if they're less than or equal to one
hundred. For ordinals greater than one hundred, use a text superscript.
The Second Reich loved its 247\textsuperscript{th} pointy helmet design.
- Write ligatures, umlauts, and other diacritics correctly.
If you want to know what a f\^ete galante looks like to a physicist, look up Schr\"odinger in the Encyclop\ae dia
- Put a non-breaking space between references, citations, and math symbols. Do not put a space before the non-breaking space; this defeats the purpose. Your word should end, be followed by a non-breaking space, followed by the reference, citation, or symbol. This prevents a line breaking between references and the thing they're connected to.
- When presenting foreign text, use the Leipzig glossing rules:
Figure~\ref{fig:pony} shows a pony with a mass~$m$ calculated by Euler's method~\cite{horse-mass}.
Word choice
I have put the most annoying and frequent issues at the top of the list and made the font size large.
- You can always delete "it turns out that", "we find that", "we show", "we provide", "our findings/results suggest that" (and variants). If you want to actually hedge, then explain why there's some doubt (i.e., don't use "suggest" as a flimsy shield).
- Never use the word "performance". It's likely a stand-in for something else (e.g., "accuracy", "latency", or "F1"). Say what you actually mean.
- Avoid "perform an X" or "conduct X". X is likely a noun form of a verb. Just use that verb. Don't "perform a split" or "conduct cross-validation" or "perform a song". Just "split", "cross-validate", or "sing".
- Use "first" rather than "firstly" (same for second, third, etc.).
- Use "unlike" rather than "different from".
- Never use "utilize"; always use "use". Corollary: prefer "usage" over "utilization", unless it's a technical term. Same thing with "leverage" (unless there's mechanical advantage involved).
- You can always delete "this means".
- You can always replace "in order to" with "to".
- You can replace "X nature" with the adjective form of X: "helpful nature" can be "helpfulness" or "usefulness".
- Never say "In this section, we"; just say "This section" or "We".
- You can always replace "to do this we" with "we".
- Use "while" rather than "whereas".
- Don't say "it is well known"; either cite or just state the fact.
- Don't say "the fact that".
- Prefer "toward" over "towards".
- You can almost always delete "notice that" or "note that". (My dumb fingers are often very guilty of writing this.)
- Use "we" rather than "this paper" or "this work".
- Use "shown" over "denoted" or "listed".
- Use "in addition" rather than "besides".
- Use "certainty" rather than "certitude".
- Unless you're talking about physically knocking someone to the ground, don't use the verb "tackle".
- Try not to use "hedging" language (often, probably, etc.). Either something is true or it isn't. If it really is a mixed bag, quantify the amount (e.g., three out of four dentists).
- Avoid the phrase "it has been shown"; if you have nothing more specific to say about it (e.g., who showed it), then just remove it. It's an agentless appeal to authority.
- Likewise, avoid the phrase "our results", particularly follwed by "suggest". If the conclusion is so obvious that it's clear what you're talking about, cut it. If not, then be specific about which of your results mean that: FancyPant's lower latency allows for higher throughput of pants-routing algorithms.
- If British and American English use different words, you should typically prefer the American ("elevator", "stroller", "diaper", "truck"). The exception is when this could lead to confusion when one language's term is ambiguous. In those cases, use the version that removes ambiguity. As a consequence, use the British terms "trousers" (not "pants"), "crisps" (not "chips"), and "cashpoint" (not "ATM"), but use the American terms "pacifier" (not "dummy"), "gasoline" (not "petrol"), "fries" (not "chips"), and "cookie" (not "biscuit"). Most of these examples will not be relevant in academic writing, but "factoid" is one notable example. British speakers view it as something that is untrue. Try to avoid it if possible. (I learned about this too late from KMH.)
- I will default to American spelling. If you send me a nearly-complete draft using consistent British spelling, I will gladly keep that convention (being an Economist reader finally pays off). If you use Indian or Canadian spelling, experience has shown that I will turn it into American spelling as I edit (sorry, I just can't keep the intermediate cases straight).
- In academic writing, avoid contractions.
- Generally, use the simpler, more direct word. Sometimes when
writing an academic paper there's a pull to sound "fancy". But it
often makes the writing unclear.
the video content has to be integrated with the headline subtlety while assessing headline veracity
sounds complicated, but it's actually something fairly simpleyou have to watch the video to know if the headline is representative or has a sublte exaggeration or misrepresentation
Paragraphs
- Make sure your paragraphs have topic sentences. Until you have mastered writing paragraphs, you shouldn't break this practice.
- Use transitions liberally. I've yet to see a student use too many transitions. I long for the day when I delete an extra transition.
Writing a paper Collaboratively
- Often, when computer scientists work together on a paper, they will discuss something using the concept of a "write token". This concept comes from a token-ring computer network. The problem is that the computers share the same wires to send messages over the network. If two computers try to send something at the same time, it will cause a problem. The same problem comes up when writing a document; if two people try to edit the document at the same time, it can cause problems.
- Thus, before editing a file shared in some collaborative environment (Dropbox, SVN, etc.), send an e-mail (or some other pre-arranged notification) to your coauthors. Something along the lines of "I'm claiming the write tokens for foo.tex". Then, edit the file to your heart's content, save and commit your changes, and then send an e-mail saying "I release the write tokens for foo.tex". Bottom lines: only modify files for which you have the write token, and release tokens for any file that you're not actively editing.
- As a consequence, it helps to "input" many smaller files to create the whole document. This enables co-authors to claim tokens to specific sections on a file-by-file basis. (SVN, CVS, and git allow you to merge files after agreeing to the same delineation of write tokens, but this is dangerous; I discourage it.)
- Make sure that your text editor notifies you when the underlying file
changes. If this doesn't happen, then you might unintentionally clobber
(overwrite) another's edits. This happens as follows:
- you edit a file (with your write tokens)
- you save the file, commit, and release the write tokens
- you leave the file open in your editor
- Collaborator Charlie modifies the files
- you update the file on your disk (but leave the file open in your editor)
- you then make some changes to the file (after getting the write tokens)
- and then you save those files, but because your editor never reflected the underlying changes, everything that Charlie did is forgotten
- you commit the changes, and your version control software thinks that you knew that you were overwriting all of Charlie's changes (because it gave you the new file), and everyone else will also miss out on Charlie's changes
Emacs does notify you when this happens: either use emacs or a text editor that does this.
- When the deadline gets really close, I like to print out a copy of the paper, mark the changes with a pen, and then quickly grab all the tokens for a paper to implement those changes. (If rewrites or large text additions are required, I do those in a scratch file.) I can then very quickly implement my changes while holding the tokens for a short period of time. If time is really short and there are lots of papers going in for a deadline, I may hand you (physically or virtually) the marked up version to implement the changes. This is meant to save time. If anything is unclear, please ask me.
- If you're actively editing a file, you should only put in the files needed to generate the output. Do not store the intermediate or final output (pdf, log, aux, etc.). This is not just to save space (every little change, no matter how small, will force a multi-megabyte file to be saved) but also because it can get in the way of other people as they're compiling the document.
- I also prefer using pdf (vector) and png (raster) files for graphics (as appropriate); this is because the output of pdflatex is usually better/easier than that of latex. This also makes Overleaf compile times faster.
- Organize the files as follows:
- Keep all style files (e.g. things that could be shared between submissions) in the "style" folder
- Keep all bibliography files (e.g., bibtex source) in the "bib" folder
- Have the main body in a file called "YYYY_venue_project.tex" and the associated data (sections, graphics, etc.) in a folder called "YYYY_venue_project"
- This allows the Makefile to work well and reduces clutter
- What about ShareLaTeX/Overleaf? I do not like to
depend on these tools. While they are good for people who are new to
LaTeX or don't have a UNIX environment, I don't think they're a
good tool for me.
- They let you gloss over annoying errors (LaTeX itself should be more intuitive, but Overleaf hides the complexity and thus problems)
- I have scripts to find grammar / stylistic errors that are hard to integrate
- I prefer to search multiple files using grep
- I prefer to have my research code generate figures/data which can be added to paper repository with a single command line (makes replication easier)
- I prefer to use emacs as editor; Overleaf has pretty high latency while typing and lacks keyboard commands for easier editing (it also gets quotes wrong by default)
- Organize files in the same way that I do. I should be able to download the source zip file and decompress it and run the Makefile to compile everything.
- Run my style scripts and address any errors before asking me to look at it (doing this is a good way of making sure it fits in my paper repository).
- Sync with a git / Github repository.
- Incorporate my preamble for LaTeX-based commenting (don't use ShareLaTeX commenting, as that doesn't work with offline editing).
- Reflow all of your text
- Still obey the paper token tracking spreadsheet for exclusive editing (in case someone needs to edit offline on a plane or something).
Timeline for Writing a Conference Paper
There are at most five important dates in a paper's life cycle: the anonymityt deadline (at least for ACL papers), the submission deadline, the author response deadline, the camera ready deadline, and the presentation. I'm going to go over each of these deadlines. As a first author, it is your responsibility for meeting these deadlines and letting your coauthors know about them (including your forgetful professor). I suggest creating an calendar invite (e.g., at UMD, Google Calendar) and invite all of your coauthors to these deadlines so that they don't forget (this includes, again, your forgetful professor).Anonymity deadline
This is the timeline if we are submitting to ArXiv before submitting to the conference. If this isn't the case, then these deadlines become the main conference timeline (in addition to those deadlines).- Three weeks before the deadline: an outline and introduction using my preamble and Makefile, determine the author list and author order (this can change, but good to get this down early). If there are less involved authors, get an affirmative response from each of them that they wish to be authors. If you do not get a reply, they are not an author.
- Two weeks before the deadline: draft of everything but the results
- One week before the deadline: complete draft including results, figures generated by figures.R or figures.py
- Day of deadline: make a social media post talking about the main points of the publication for both a general (top of the thread) and technical (bottom of the thread) audience. Select a good figure to attach to the thread.
Submission deadline
This only applies to papers that were "complete" before this deadline. Either ArXiv preprints or resubmissions.- One month before the deadline: send the paper to people you respect and trust to see if they have comments
- 10 days before the deadline:
- clean up your code and replicate your results (undoing any last-minute hacks)
- 10 days before the deadline: before the deadline: Look through other papers to see if there are new things to cite, remind all authors of the submission and how to contribute.
- Five days before the deadline: incorporate all coauthor comments.
- One day before the deadline, make sure you have a conforming submission, you have filled out all of the metadata, and you have everything ready to submit. If the conference allows you to revise a submission, submit your current draft.
- Go through the submission checklist
Author response deadline
- Once reviews are available: put all of them in a Google Doc.
Structure the Google Doc as follows:
- Response draft comes first. Copy the form and instructions from the conference manager form as closely as possible (list word/character limits, have a single cell table for text boxes to make it clear what gets copy/pasted later).
- Then put the reviews after a pagebreak (Insert -> Break -> Page). You can add comments either as first-order comments in the Google Doc or as differently colored text, regardless, make it clear that any of your comments are not part of the review typographically.
- Title the document as YYYY VENUE AR: Title of Paper
- Share with all co-authors. Specifically for me, please share with ying@umd.edu (and not jbg@umics.umd.edu), as that's associated with a Google account.
- Three days before deadline: Have a draft of the author response ready
- One day before deadline: If site allows resubmission, park a copy of your response
Camera-ready deadline
First, congrats! It's great that your paper has been accepted. Your work isn't done yet. Do not ignore these steps. They are very important for you to take ownership of the paper, make sure that your reputation will be safe, and to keep your advisor and funders happy.
- Immediately (within three days) upon notification:
- Do not create a new Overleaf. I understand the temptation to track the history, but use the git repository for that.
- Review the comments on the paper from coauthors
- Turn the reviews into todo lists and order everything starting with easy / obvious things and going down the list to impossible / difficult / things you disagree with (call the items generated from the first two items TODO)
- Make a list of relevant papers that have come out since you submitted (call this list CITE)
- Add author names and acknowledgements
- Link the todo from the paper token tracker
- Make sure your vector figures are generated programatically from files in the repository using either ggplot2 or plotnineRun the style checker on the source. (This should have been done before, but there's no excuse now.)
- Within five days:
- Do any trivial TODOs
- Discuss prioritization with the senior authors
- Update the related work from the CITE list
- Proofread everything
- Move anything that was cut at the last minute
- Make sure there are no style issues flagged
- Two weeks before the deadline, create a version of the document that:
- Addresses all of the trivial issues from the TODO list and CITE list
- Has everything spell-checked and typos corrected (do not neglect the style check, and review all of this webpage)
- Also has presentation versions of all of the figures in the paper. This may result in better figures overall, so it's good to do it now.
- Has everyone's name is added to author list and spelled correctly (if there's any doubt, ask)
- Uses the right LaTeX style file (it sometimes changes between submission and camera ready) Cite any recent work that's relevant and has come out (or you learned about) since you submitted
- Has no widows or orphans
- In the rest of the time left:
- Send the document to other folks to read over (you must do this if there are any coauthors not deeply involved in the writing process, but also a good idea to send to relevant people who might have feedback on paper, e.g. if you built heavily on someone else's work)
- Carefully incorporate their suggestions - don't let the document leave a camera ready state
Work on any of the more difficult TODOs
- Once you submit the camera ready:
- Put it on your webpage and send the final PDFs to the authors
- Add it to my webpage. This happens through a pull request:
- Add a new file in the pubs directory,
with the filename YEAR_VENUE_WORD where WORD is the most
memorable word from your title (see the other filenames for
examples). This file should (see other files for examples):
- Link the project that funds you via the webpage
- Include an accessible to the public abstract
- Link to the data on DRUM and/or Github
- Upload the final PDF to the src_docs folder.
- Later on, when we get the acceptance rate, make sure to update that (again with a pull request).
- Add a new file in the pubs directory,
with the filename YEAR_VENUE_WORD where WORD is the most
memorable word from your title (see the other filenames for
examples). This file should (see other files for examples):
- Update your CV
- Post any associated data and code on your webpage
- Move the paper to our publications repo
- Upload your data to DRUM
- Once you submit the camera ready:
Video deadline
Unfortunately, conferences are giving less and less time for this. So if you want to do something ambitious, it's good to have practiced this and have an idea before the deadline is announced. The guidelines here are written with a really short timeline in mind, but hopefully you'll have more time than this.
- Continuously: Get inspired by other videos
- Three weeks before the deadline: Create a script / storyboard for the video in Google Docs (add these links to the paper token tracker). It's okay if it's short. Just hit what you need to do, and share that with all coauthors.
- Two weeks before the deadline: Create slides to accompany the script (if needed). This may require adjusting the script.
- Ten days before the deadline: record the video
- One week before the deadline: Create a rough cut of the edited video in Premiere, share with coauthors (source shared with Google Drive)
Conference presentation
- Three weeks before conference: share draft of poster / slides
- Two weeks: practice presenting
- One week: Revamp social media post from ArXiv deadline, attach to paper token tracker
- Day before: Post reminder of when / where talk is on social media
- Day after: Follow up with the most interested people you interacted with
Writing a Results Section
Here's a recipe for writing your results section:- Figure out the 3--4 things you'd want people to take away from the results in the paper
- Create subsection titles that summarize those findings
- Each subsection should describe what you did, but more importantly, help the reader interpret how the results get you to the findings you want the user to come to the same conclusion that you did. It's not enough to explain what you did and expect the tables/figures to speak for themselves.
Error Analysis
When writing a paper with an objective measure of success or failure, it is important to include an error analysis. This is helpful for multiple reasons: it shows that you've actually looked at your data, that you understand what your algorithm is doing, and you know how your method is better (or at least different) from other algorithms.
For example, let's say that you're doing a paper on word segmentation and you're comparing to the Zigglebottom segmenter. One thing you can do is sort all of your sentences by the following metric: #words you got right - #words Zigglebottom got right. You should look at both extremes. Are there patterns in the things you get right and they get wrong? If so, that likely reveals something about what your algorithm is doing.
Otherwise, if you're writing a paper on a well-known task / method, you risk a reviewer reading it and saying, okay, I might do slightly better, but why? Are there pitfalls I should be aware of? It also makes sure you understand what you're claiming better, which allows you to write a better paper overall.
Moreover, good error analysis can improve your model.
Do not make broad claims like "our algorithm works well on long-term promises and apologies" without backing it up. Make it clear that you've done your homework. You can be inspired by examples, but back it up with numbers / statistics from your data: "of messages with X, our method gets Y correct, while humans only get Z".
Writing an Author Response Document
Take a look this longer document from Devi (I disagree with directly quoting, I prefer paraphrasing).
For some conferences and most journals, you will have the opportunity to respond the the reviews you've received. This often doesn't matter. Many times the reviewers all loved or hated your paper. You don't have much of a chance to change their minds. However, when your paper gets marginally positive reviews or one outlier review, a good response can really help.
First, be as brief and civil as possible. If you don't have much to say, don't try to fill the space. Only use as much of the allotted space as you actually need. If there are no statements of facts you can offer to help clarify a reviewer's opinion, then just let it go. Make sure to thank the reviewers and to be gracious. They may be idiot numbskulls who trashed your paper, but they did so as a volunteer. If they didn't understand something, it's usually your fault for not writing it well in the first place.
Second, make your response as self-contained as possible. The PC will read your response by itself. They should be able to get the gist of the reviewer's opinions through your response and understand your reply without looking anywhere else. Obviously you cannot include all necessary details, but make sure your response stands on its own as a single, coherent document.
Reviewer Three questions whether the Pope is Catholic. We follow the convention of Captain Obvious (2003), which has been the foundation of many subsequent papers.
If you don't do this, then the AC has to go back and read the review to know what you're responding to. You want the AC to spend as little time reading this bad review as possible! Let them read your response instead (and it will be more reasonable, better written, nicer, etc.).
Don't get this wrong by writing your response as an inline e-mail. It's an inefficient use of space and you can better frame the issues than the reviewers. (It may be helpful to compose the response next to the reviews, but don't assume that the reader will have access to that information.)
One way of structuring paragraphs in a response is:
- What is the complaint / issue (and who raised it)
- Is it true and/or important?
- Give evidence for that position or why you did what you did.
For subjective criticisms, never say your paper is "bad" (i.e., confusing, incremental, etc.). You can agree with the reviewer by saying things like "we appreciate the opportunity to help explain our approach further here in the response and will do X in the revision".
Even if you don't get to write a response, once you get a set of reviews, create a spreadsheet (shared with co-authors on Google Docs) to address the issues that the reviewers found with the following columns:
- Reviewer ID: Even if you know who it is (e.g., from a paper clinic, it's probably better to turn these into numbers).
- Issue: If it's a typo, note the typo. If it's something bigger, like a characterization issue, copy-paste or paraphrase.
- Notes: Write down anything that might be relevant to jog your memory if you come back to it later (e.g., related work, a paper that does this well, etc.).
- Difficulty: On a scale of 0 (fix a typo) to 10 (rewrite all the things!), how hard do you anticipate the fix being? This will become quite important in a few moments, so be honest!
- Fix: How can you fix the issue? A very important note here: "I'm not going to fix it" is a perfectly legitimate response!
- Who: For multi-author papers, keep have the first author assign responsibility to each of the authors.
- Done: Has this been addressed? For non-trivial stuff, explain how the problem was solved.
Other than organizational fixes (which should be done first), it's often easier to go in increasing order of difficulty. Lots of little fixes can sometimes fix bigger issues as you work your way through. This spreadsheet will also help to write a response document (more important for journals than conferences).
Checklist before submitting something
- Look over your citations. Do they look uniform? Sometimes when you copy/paste a bibtex citation, you get a messy entry. In addition, now is a good time to make sure you followed all of the citation best practices.
- Read over your section titles. Do they tell a story? Don't just use "introduction", "background", "model", "inference", "conclusion", etc. While your sections may map to those, call them something more specific.
- Make sure your section and subsection titles have correct and consistent capitalization (check the template for examples)
- Do your figures stand on their own? Make sure a bored reader / reviewer can get the gist of what's going on through the figures alone. So if your first figure is about the data you use, make sure that motivates the method you'll use. If you have a figure of the graphical model, don't just say "our graphical model"; instead, point out the salient aspects of the model that makes it interesting to a reader under time pressure. If you have a figure of results, don't just say "summary of results"; instead, point out what the results mean in the caption. You'll say something similar in the text, so don't repeat yourself. Likely there are different interesting trends you can point out.
- Is your abstract persuasive and interesting? When an area chair reads it, will they send it to the reviewers that you want?
- Do a spellcheck!
- If you have one subsection, you must have at least two
- Double check that you're using the correct style file; put it next to other submissions and make sure they look the same (same numbers of lines per page, etc.).
- Make sure you've spelled your co-authors' names correctly (not always obvious; perhaps they use middle initials or a different name / spelling for publishing) and you have the correct affiliation (did they move since you first wrote the paper)?
Whew, I just submitted! Time to forget about the paper until reviews come back ...
Not so fast! You likely cut some corners in submitting the paper. While things are still fresh in your brain, fix things now so that you can be prepared when the paper is accepted or you work toward resubmission. Sometimes the submission site will open back up; be prepared to take advantage of that. Or you may want to post on ArXiV after submission/rejection; fix these problems first.
- If you disobeyed the style guidelines on this page, fix those issues now. Common things that people fail to do: not have figures auto-generated, explicit lists of contributions, not run the style checking script, non-vector figures, ugly tables, ugly bibliographies, etc. You'll have to do it eventually, save yourself some hassle and get it done now.
- Add in error bars (e.g., multiple runs) and hyperparameter sensitivities now that the cluster is free again.
- Document your code for release, make sure you can rerun all of your experiments with a single push of a button.
- Address all of my remaining comments and see if there are people you should send it to for review (if nothing else, pass it around CLIP).
Camera Ready
First, congrats! It's great that your paper has been accepted. Your work isn't done yet. Do not ignore these steps. They are very important for you to take ownership of the paper, make sure that your reputation will be safe, and to keep your advisor and funders happy.
There is no excuse for leaving this until the last minute. You've already done the work, and this is the last chance to get it right. I've increasingly had the issue of having students ignore this.
- Immediately (within one week), do the following:
- Have your advisor give notes on the current draft of the paper
- Add acknowledgements to anyone who helped you at any point while working on the paper. Read an early draft, provided data, provided help, had interesting discussions with, etc.
- Add an appropriate acknowledgement to the grant funding you (use previous papers as an example). For my students, find your project on my projects page, ask about anything you're unsure of.
- Turn the reviews into todo lists and order everything starting with easy / obvious things and going down the list to impossible / difficult / things you disagree with (call the items generated from the first two items TODO)
- Make a list of relevant papers that have come out since you submitted (call this list CITE). Look through other papers accepted to the same conference, look at people you cited in the submission (have they done any followups), do a search on Google Scholar and ArXiv. You always forget to cite something, so I will be very suspicious if your references don't change going into the camera ready.
- Move the paper to a public publications repo to make it easier for people to read drafts and make suggestions.
- Within two weeks, meet with your advisor to select:
- What things out of TODO are trivial
- See if they have other things to add to the list
- What citations to add from your CITE list
- Within three weeks (or at least two weeks before the camera ready deadline), create a version of the document that:
- Addresses all of the trivial issues from the TODO list and CITE list
- Has everything spell-checked and typos corrected
- Has everyone's name is added to author list and spelled correctly (if there's any doubt, ask)
- Obeys the space limits
- Uses the right style file (it sometimes changes between submission and camera ready)
- Has no widows or orphans
- More broadly, the paper should look like the published papers on my webpage (hopefully better). Every commit should be something that can be posted immediately. Do not let the paper leave a camera-ready state.
- In the rest of the time left:
- Send the document to other folks to read over (you must do this if there are any coauthors not deeply involved in the writing process)
- Carefully incorporate their suggestions, but don't ever let the document leave a camera ready state
- Work on any of the more difficult TODOs
- Once you submit the document:
- Put it on your webpage and send the final PDFs to the authors
- Notify your department's publication mailing list
- Update your CV
- Make a Tweet about the paper (mention me and I'll retweet)
- Make sure the final PDF is pdf/a compatible including the appropriate package (preferred), using Acrobat, a web tool, or Ghostscript.
- Post any associated data and code on your webpage
Talks
- Read these general advice pieces from Will Ratcliff, Robert Geroch and Jason.
- Use Beamer and OmniGraffle (but keep your Beamer slides uncluttered with navigation and sections).
- Don't use a laser pointer (integrate any emphasis into your slides). People have a tendency to over-use laser pointers, and they are not always usable (because of the screen, multiple screens, etc.).
- Don't read from notes (and don't use your slides as notes). If you feel uncomfortable delivering the talk, it means you need to practice more.
- If you can ever use a picture instead of an equation, choose the picture (even if it takes more time).
- Think about your color palette. Have one to two main colors and use them everywhere: figures, pictures, bullets, titles, backgrounds (schools / labs likely have their own favored colors ... just use those and don't think about it).
- You're likely going to going to adapt figures from your paper.
That's fine, but talks are a different medium. Usually, you want to
gradually reveal aspects of the figure so you can focus attention on
what you're talking about. Usually, you'll show far less
information than you did in your paper. You almost never want to
copy a big results table verbatim from your paper.
Posters
Here's a video that proposes something very different from what we normally do. I'm not sure I agree or endorse it, but it's a good way of thinking about the failure modes. We should do better about conveying the main points and not being so cluttered. It is very useful to think about what your giant takeaway message is and that should indeed probably be in giant font on your poster.
Without adopting this format completely, I would suggest the following:
- The title (the really big text) doesn't need to match your paper's title. Think about what's relevant to the audience and what's new about your paper (just like in the video above).
- Optimize your poster for skimming. Don't have too much text. Most things should be pictures / diagrams (with big fonts).
- Have pictures for anything that you might mention.
More generally, optimize your poster for both the average case (someone standing there for four minutes) and for the tails (world expert who wants to drill into details).
- Practice a four minute version of your talk (no more!) that gives the highlights of your paper and teases about what additional areas of detail that you can go into if people want to have a conversation. Figure out the slides you'd want to have for that four minute version and make that the centerpiece of your poster.
- Read this General Advice
- Use Beamer
- Use blocks with informative titles (just like section headings)
- Do not put two images / tables in a block unless they are showing very related information (e.g., don't put results from two different datasets in the same block).
- Use a color scheme consistent with your institution; look at previous posters from the group before you begin.
Videos
Even before Coronavirus, conferences were asking participants to make a video teaser for a paper. This is a handy way to also practice a short pitch for your poster. Now, many talks are only given through videos. I think this is a great opportunity, as I describe in this video:
- Figure out who is going to be in the video. If professors are involved, find a time now because scheduling may prevent them from taking part.
- Write a script first. It takes longer if you ad lib and screw things up constantly. Don't assume that you'll do it correctly in one take: you'll record each section multiple times and take the best one.
- Be completely sure that your script fits into the time allowed. It makes life very hard on the editor if they have to cut actual content.
- If you're pressed for time and at UMD, you may want to record in a one button studio.
- Make sure you have good lighting. Try to find an east-facing
window in the morning or a west-facing window in the afternoon. Put
the camera a little bit above eye level. You don't want to record
from a laptop on a desk: good for working, but not for creating a
video. A laptop on a big pile of books is good, though.
- If you can, you also want to have "bounce" lighting from below. You can get this by putting a white tablecloth (or sheet) on the desk in front of you.
- This video is a good discussion of how to fix problems in videos. However, the moral of the story should be to get it right during filming; it's far less work!
- While it's good to script videos, the actual reading can feel too scripted. Think about the emotion of the lines as you read them, and memorize enough so that you're constantly looking at the camera as you read it.
- Apart from investing in a smartphone teleprompter, one way of
making a script is to pair up with someone else. Thanks to editing
magic, you can make this more natural via the following:
- The idea is we'd want to do compositing so that it looks like
people are talking to each other in the same room (agree with your
conversation partner who is on the left and who is on the right
and on the height of the camera from ground, probably 2m-ish).
This can either be done by splicing or with green screen.
Splicing could work well if people can agree on similar
backgrounds (e.g., water / mountains / blank white wall).
- Put the script far enough away from your face that the camera can't see your saccades as you read.
- Slides should be in 4:3 format so we can insert (for YouTube) the slides between the people (which assumes 16:9 ratio by default).
- I think it's better if people are standing, but only do that if you're comfortable. Have a stuffed animal or something where your virtual conversation partner will be so that you can make "eye contact" with them.
- For actual recording, you should do it offline on laptop or phone (if you don't have camera and tripod). Don't use a laptop's internal camera (more on this below). If you're recording with someone else, to make sure the reactions are natural, you should call the other person in conversation on a *separate* device and record simultaneously.
- Stop every five minutes or so to make sure the recording has worked okay (both video and sound).
- The idea is we'd want to do compositing so that it looks like
people are talking to each other in the same room (agree with your
conversation partner who is on the left and who is on the right
and on the height of the camera from ground, probably 2m-ish).
This can either be done by splicing or with green screen.
Splicing could work well if people can agree on similar
backgrounds (e.g., water / mountains / blank white wall).
- If you're not trying to merge your video with someone else's
else video, have an interesting but not distracting background. Shelves
full of books, nature scenes, or cityscapes work well. (You can
also have a blank "void" as a background, but I think a more
interesting background works better.)
- If you don't like your background, don't use polygons to cut it out.
- If you want to remove the background, use a greenscreen technique (you can do this with a green sheet fairly cheaply).
- Make sure your background doesn't match the color of your shirt, hair, or skin.
- Use a higher resolution for recording so that you can have
different zooms to give a little bit of variety in the final edited
version. Focus on who is speaking, have different levels of zoom to
hide cuts for content / mistakes.
- Have a good condenser microphone either in front of you or on your lapel / collar. Also make sure that you don't have sounds in the background (construction, cat, etc.). These are usually not expensive, but are a fantastic investment. A big, bulky, cheap microphone is way better than what's built into your computer (size matters). Sometimes microphones will require a battery; if so, make sure the battery is turned on and you have the correct battery.
- Wear darker colors (orange, brown, grey) and make sure your clothes have a solid pattern (no plaid, etc.).
- For research teasers, go no more than three minutes.
- Create the words first, then back up the words with images. For example, if you're talking about pottery from Delft and comparing it with a collection in the Forbidden City, show them!
- Don't be limited to slides, use film / animations if you can: make full use of the medium! Similarly, you can get many people to make clips for your video. You can divide up the work and add variety!
- Look directly at the camera: be natural on the camera and make
sure you use the full intonation of your voice. Practicing reading
books to young children helps with this.
- Record yourself saying every sentence multiple times so you
have options. Pick the one that worked best.
- Talk naturally, but with a little more energy than you would normally.
- Record the slides using a screen recorder (e.g., OBS) and combine them with your face later.
- I prefer placing slides over the presenter rather than putting the presenter(s) inside the videos with the slides.
- Please edit your video with Adobe Premiere (free for UMD students) so that I can tweak the end result.
Personal Statements
- No sentence should be something that more than five people could write. Either it's not specific/detailed enough, or it's a cliche.
Creating a Disseration Document
If you're a student in CLIP, don't forget to review the the wiki.- Introduction
Now, it may seem that your PhD was just bouncing around between a set of different topics/projects. But there often is a reason that you moved from one project to another, and with the advantage of hindsight, you can tell a story about why that happened.
Think creatively about a narrative that connects all of your work together. It may not be chronological! It's possible that later work helps explain why earlier work was a good idea in the first place; it's fine to order things differently, and the introduction is a way for you to explain how things actually fit together.
Don't put too much background material in the intro chapter. That comes later. It's good to have citations, but the traditional CS citations can go in the next chapter. Challenge yourself to cite papers and people you haven't cited before.
Pretend that you're a historian piecing together your papers. You believe there was a reason this great scientist worked on X, Y, and Z. Come up with a good story about it!
- Background
The background chapter is in many ways the most important chapter. Write it as if you had a time machine and you were going to send back a document to yourself to speed along your PhD. What are all of the things you would need to know to get it done in, say, a year. This is what should go in the background chapter: all of the stuff you wish you knew before you did all this research but that you didn't learn in your undergrad.
However, this isn't just a list. You need to figure out how to explain it well. Are there connections between these things that you hadn't seen before? Don't just regurgitate the places you learned it from; write it in the way that makes sense for your dissertation.
For example, Mohit Iyyer created a bunch of reference implementations of deep learning models for text (circa 2015) and created a background chapter that covered all of them with coherent, consistent notation organized a sentiment analysis / political polarity example. This was both coherent and concrete.
- Copy/Paste
After you get to the content chapters, you may think that you're home free. Just copy/paste the papers you've written, right? While there is a lot of content you can reuse, I'd suggest the following:
- Delete your introduction and conclusion
- Make all of the notation consistent between chapters (corollary: make notation between papers as consistent as possible)
- Remove all references to "This Paper", replace with "This Chapter"
- Replace references to your other papers with the appropriate chapters, and provide more contrast than you would in a paper.
- Your background section is tricky: most of it should be in your background chapter. If there's something that isn't in the background chapter for a good reason, think carefully where it should go: another chapter (e.g., it's more connected to another paper) or this chapter (e.g., a section devoted just to that concept).
- You can never say "because of space restrictions" in a thesis. Many of the things that were in an appendix should now be in the main text. Only if they're really boring should they not go in (but if they're important, try to make them interesting!).
- Conclusion: Pretend that you're starting up a new research group at a university/company. You have a five person team that you need to provide with work. What directions would you extend from your research?