2-Corpora
What is a corpora?
1. Corpora Def: a large or complete collection of writings
For NLP, we need corpora with linguistic annotations Markup formats: XML, JSON, CoNLL-style e.g. Brown, WSJ, ECI, BNC, Redwoods, Gigaword, AMI, Google Books N-grams, Flickr 8K, English Visual Genome Why need corpora
manual rules or database (rule-based, symbolic, knowledge-driven) learning: provide example input/output pairs for supervised learning 2. Sentiment Analysis Goal: predict the opinion expressed in a piece of text (e.