This is a subset of the blog authorship corpus. https://u.cs.biu.ac.il/~koppel/BlogCorpus.htm Each directory corresponds to a blog. Each blog has subdirectories corresponding to a day. Inside the day subdirectory may be multiple posts; posts are identified with an underscore followed by a number. For each post there are multiple files with different extensions. The extension of the file denotes the content: The original text (txt): Each line corresponds to a text tile (https://web.archive.org/web/20201026221833/https://people.ischool.berkeley.edu/~hearst/research/tiling.html) Persuasion Tactics: Each line corresponds to the presence (1) or absence (0) of a tactic in a tile (indexed by line number) Consistency Empathy FavDebts Outcome Principles Reasoning Recharacterization Scarcity SocPressure Traits VIP Persuasion: Each file contains one line, and the line corresponds to whether there is persuasion in the blog post (or not) PERSUASION_action PERSUASION_belief