Repository of Censored and Sensitive Chinese Keywords
December 12th, 2014Via: CitizenLab:
…we have collected 13 lists of sensitive Chinese keywords and aggregated them into a single, sortable, and share-able CSV file (see a Google docs sample, sorted by the number of lists each keyword appears on). This file, along with a description of the 13 lists and their sources/origins, are located in a GitHub repository that will be updated as new Chinese keyword lists are identified.
The 13 lists contain 9,054 unique keywords, including those in Chinese, English, pinyin, or a combination of the three. The lists go back as early as 2004 (the leaked Tencent QQ blacklist) and were produced as recently as November 2014 by Citizen Lab collaborator Jeffery Knockel (University of New Mexico), who extracted 910 keywords from Sina Show.
