# Content
This data set consists of 20000 messages taken from 20 Usenet newsgroups.
One thousand Usenet articles were taken from each of the following 20 newsgroups.
- alt.atheism
- comp.graphics
- comp.os.ms-windows.misc
- comp.sys.ibm.pc.hardware
- comp.sys.mac.hardware
- comp.windows.x
- misc.forsale
- rec.autos
- rec.motorcycles
- rec.sport.baseball
- rec.sport.hockey
- sci.crypt
- sci.electronics
- sci.med
- sci.space
- soc.religion.christian
- talk.politics.guns
- talk.politics.mideast
- talk.politics.misc
- talk.religion.misc
Approximately 4% of the articles are crossposted. The articles are typical postings.
# Source
[Twenty Newsgroups Data Set - - UCI](https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups)
**Original Owner and Donor**
Tom Mitchell
School of Computer Science
Carnegie Mellon University
tom.mitchell@cmu.edu
20 Newsgroups dataset -
Files
License
CC-BY-SA-4.0