What are the most important articles in Wikipedia?

WikipediaWikipedia has 4,362,397 articles in English.  But how many of those are seriously encyclopedic, and what are the most important articles?

We’ve been looking closely at Wikipedia for an upcoming app. We wanted to know the most important articles. We calculated an importance score for every article, based on how richly linked a Wikipedia article is within Wikipedia (the number and quality of links to a page), how many languages an article has been translated into, the brevity of the title, how popular an articles is (web hits), and the number of citations/references of an article (scholarliness).

The following are our results. This is an arbitrary, but interesting ranking, so we wanted to share it:

Top 100 English Wikipedia articles:

France, Germany, Canada, Australia, England, United_States, China, Japan, Russia, London, Italy, India, Animal, Poland, Brazil, Iran, Spain, California, Romania, Europe, Mexico, Sweden, Scotland, Switzerland, Netherlands, Turkey, Israel, Paris, Philippines, Pakistan, Norway, United_Kingdom, Insect, Indonesia, Denmark, Greece, Arthropod, Belgium, Chicago, Syria, Texas, Argentina, Marriage, Singapore, Egypt, Malaysia, Austria, Ukraine, Taiwan, Virginia, Islam, Wales, Finland, Florida, Ireland, Philadelphia, Portugal, Rome, Azerbaijan, Afghanistan, Latin, Bird, Boston, Pennsylvania, YouTube, Hungary, Serbia, Vietnam, Berlin, Plant, Quebec, Buddhism, Croatia, Massachusetts, Christianity, Bulgaria, World_War_II, Thailand, Facebook, Protein, Earth, Africa, Chile, Village, Species, Iraq, Colombia, Burma, Slovenia, Toronto, Moscow, Cuba, Mathematics, BBC, Montreal, Fungus, Peru, Chordate, Estonia

101-200:

Jesus, Jews, Nigeria, Lepidoptera, Ontario, Slavery, Ohio, Sydney, Illinois, Napoleon, Basketball, Melbourne, Maryland, Internet, Human, Tokyo, Jazz, Lebanon, Mumbai, Nepal, Istanbul, Bangladesh, Agriculture, Google, Asia, Seattle, Hawaii, Beijing, Warsaw, Iceland, Athens, Philosophy, Venezuela, Atlanta, Michigan, Jerusalem, English_language, Detroit, Cyprus, Guitar, Ethiopia, Vienna, NASA, Kenya, Mollusca, Morocco, Minnesota, Cricket, Association_football, Hinduism, Slovakia, Oxygen, Amsterdam, Bacteria, Algeria, Enzyme, Manhattan, Microsoft, Prague, Alaska, Edinburgh, Television, Belarus, Judaism, Milan, Kerala, Latvia, Vancouver, Mammal, Census, Tennis, DNA, Madrid, Economics, New_York_City, Houston, Oregon, New_Zealand, Baseball, Cancer, Copenhagen, Moon, Barcelona, Dublin, NATO, Manchester, Armenia, Wisconsin, Lithuania, Liverpool, Protestantism, Gene, Madagascar, Indiana, Ecuador, Muhammad, Gold, Sun, Law, Alabama

201-300:

Hangul, Renaissance, Nazism, Physics, Linux, Bible, Budapest, Water, Hydrogen, Albania, Malta, Baltimore, City, Science, Louisiana, Colorado, Birmingham, Soviet_Union, Antarctica, Stockholm, Jordan, World_War_I, Uruguay, Evolution, HIV/AIDS, Jamaica, Singing, Communism, Somalia, Glasgow, Education, Tanzania, Bolivia, Film, Arizona, Pittsburgh, Kentucky, Libya, Luxembourg, Missouri, Wikipedia, Connecticut, Tuberculosis, Ghana, Euro, Kolkata, Sociology, Alberta, Psychology, Twitter, Novel, Sanskrit, Oklahoma, Zimbabwe, Socialism, Shanghai, Kazakhstan, Aristotle, Anime, UNESCO, Dallas, Religion, Dubai, Dog, Ottawa, Mars, Yemen, Venice, Hamburg, Sicily, South_Africa, Greenland, Delhi, Copper, Asteroid, Biology, Quran, Fish, Los_Angeles, Rice, Munich, Seoul, Catholic_Church, CBS, Watt, Chennai, Miami, Cambodia, Archaeology, Actor, Tennessee, Belgrade, Tunisia, New_York, Atheism, Pope, Christmas, Cameroon, Genus, Vermont

301-400:

Computer, Caribbean, Brooklyn, European_Union, Democracy, Oslo, Utah, DVD, Iron, Bangkok, Florence, Ecology, Aluminium, History, Frog, Music, Moldova, Chemistry, Horse, Language, God, Sudan, Mongolia, Iowa, Uganda, Denver, Austria-Hungary, Lisbon, Automobile, Qatar, Jakarta, Naples, Nevada, Maize, Panama, Fascism, Maine, Kuwait, Arkansas, Cat, Malaria, Haiti, Medicine, Augustus, Star, Kiev, Dinosaur, Hindi, Beetle, Mississippi, Newspaper, San_Francisco, Lutheranism, Sugar, Amphibian, Moth, Brussels, Damascus, Muslim, Album, Cleveland, Piano, Bahrain, Midfielder, Reptile, Eminem, Nicaragua, Cairo, Hong_Kong, Plato, Korea, Germans, Culture, Maharashtra, IBM, South_Korea, Bristol, Petroleum, Homosexuality, NBC, Minneapolis, Macau, Guatemala, Angola, Monaco, Uzbekistan, Manitoba, Manila, Bavaria, Karnataka, United_Nations, Astronomy, Tree, River, Namibia, Belfast, Kansas, Spanish_language, Poetry, Geneva

401-500:

University, Americas, Frankfurt, Laos, Charlemagne, Electron, Al-Qaeda, Population, Queensland, Virus, Bangalore, Brisbane, Engineering, Blues, Wheat, Submarine, Hollywood, Barack_Obama, Calgary, Cornwall, Sri_Lanka, IPhone, Poverty, Cologne, Blog, Chess, Atom, Steel, Scandinavia, Cardiff, Snake, Shiva, Helsinki, Carbon, Rock_music, Globalization, Zinc, Suicide, Prussia, Mali, Catholicism, Roman_Empire, Fruit, Linguistics, Manga, Fiji, Middle_Ages, Eukaryote, Radio, Brain, Tehran, Canberra, Edmonton, Milk, Coal, Perth, Alps, Liberia, Stroke, Kosovo, Coffee, Anthropology, Cincinnati, Theology, Municipality, Lion, Pneumonia, Crusades, Hertz, Government, Catalonia, Montenegro, Capitalism, Milwaukee, Cattle, Honduras, Wyoming, North_America, Mauritius, French_language, Oman, Food, Electricity, Bucharest, Volleyball, Vikings, Christian, Auckland, Sheep, Lawyer, Liberalism, Telecommunication, Tourism, Ethanol, Elephant, Gujarat, Winnipeg, Kyrgyzstan, Gibraltar, Earthquake

501-600:

Volcano, Paraguay, Feminism, Turin, Sculpture, MTV, Lake, Senegal, Freemasonry, Painting, Butterfly, Beirut, Saskatchewan, Jupiter, Bhutan, Boxing, Advertising, Silver, Marxism, HIV, Adelaide, Siberia, Marseille, Czechoslovakia, Ottoman_Empire, Brunei, Nebraska, Karachi, Gastropoda, Golf, Urdu, Idaho, Constantinople, Forest, Wine, Mesopotamia, Theatre, Endemism, Baghdad, Oxford, Technology, Nitrogen, Leeds, Anatolia, Delaware, War, Palestine, Belize, Sony, Bollywood, Statistics, Tasmania, Schizophrenia, Johannesburg, Art, Terrorism, Suriname, Stuttgart, Mozambique, Pregnancy, Lead, Racism, Intel, Wii, Toyota, Potato, Vietnam_War, Temperature, Geology, American_Civil_War, Thessaloniki, Greeks, Opera, Biodiversity, Guam, Bermuda, Zambia, Photography, Beer, Extinction, Czech_Republic, Spider, Saudi_Arabia, Balkans, American_football, Rihanna, Barbados, Sport, Desert, Ultraviolet, Cambridge, Anarchism, Email, Baptism, Antisemitism, Java, Kent, Indianapolis, German_language, Politics

601-700:

Mecca, Drama, Jainism, Sufism, Moses, Metallica, Tibet, Sheffield, Ecosystem, Taliban, Metabolism, Conservatism, Batman, Algorithm, Crete, Cocaine, Alcohol, New_Jersey, Planet, Celts, Zagreb, Honolulu, Coca-Cola, Lyon, Mountain, Venus, Vertebrate, Abortion, Bat, Violin, Romanticism, Maldives, Sofia, Yorkshire, Superman, Honda, Nintendo, Havana, Meat, Anglicanism, Republic, Inflation, Guyana, Ammonia, Jay-Z, Geography, Fossil, Copyright, Neolithic, Sulfur, Sharia, Energy, Helicopter, Mineral, Guangzhou, Genetics, Blood, Ship, Obesity, Diamond, Cold_War, Smallpox, Osaka, Bishop, Yahoo!, Yugoslavia, Chad, Library, Physician, Bratislava, Tajikistan, Andalusia, Asphalt, Ethics, Red, Methodism, HBO, Lima, Professor, Town, Prostitution, Apple, Writer, Puerto_Rico, Blue, Tax, Taoism, Liver, CNN, Time, Sardinia, HTML, Myspace, Architecture, Hydroelectricity, Taipei, Potassium, William_Shakespeare, George_Washington, Pinyin

701-800:

Uranium, Riga, Hypertension, Ljubljana, Cotton, Bihar, Wiki, Wellington, Calcium, X-ray, ITunes, Soil, Elizabeth_II, Quakers, Macintosh, Mayor, Honey, Flower, Alcoholism, Satire, Country, Assam, Lancashire, Walmart, Soybean, Himalayas, Concrete, Asthma, Mining, Antwerp, Lahore, Baku, Gospel, Montevideo, Feudalism, Castle, Allmusic, WWE, Genoa, Police, Calvinism, Yoga, Primate, Alexandria, Saturn, Eritrea, Saint_Petersburg, Krishna, Homer, Lesbian, Barley, Dresden, Antibacterial, Logic, Baptists, Turkmenistan, Ant, Mitochondrion, Rape, Strasbourg, Leipzig, Judo, Kidney, Bali, Tiger, Nationalism, Mythology, Heart, Disease, Botswana, Seville, Dhaka, Salt, Insurance, Algae, Michael_Jackson, Malayalam, BMW, Unicode, Sodium, Tobacco, Satellite, Oak, Patent, Metro-Goldwyn-Mayer, Banana, Harvard_University, Bank, Rapping, IPad, PHP, Byzantine_Empire, Organism, Vilnius, Mosque, Santiago, Sparta, Marketing, Mahabharata, Slavs

801-900:

Synthesizer, Transylvania, Talmud, Book, Nokia, Malawi, French_Revolution, Magnesium, Glacier, Rajasthan, Danube, Constitution, Cher, Hewlett-Packard, Cheese, Tea, Crustacean, Liechtenstein, Dorset, Software, Agnosticism, Photosynthesis, Northern_Ireland, Anatomy, Flowering_plant, Nile, Guinea, Infrared, Oceania, Helium, Gothenburg, Rotterdam, Sarajevo, Wi-Fi, North_Korea, Ronald_Reagan, Immigration, Friends, Easter, Apollo, Glass, Goa, Sex, Queens, Cholera, Geometry, Plastic, Ocean, Muscle, Reggae, Microsoft_Windows, FIFA, Andorra, Russians, Tallinn, Autism, EMI, Gravitation, Smartphone, Shark, Pornography, Olympic_Games, Tram, Tornado, York, Xinjiang, Website, Vegetarianism, Influenza, Ancient_Rome, UEFA, Limestone, Database, Sea, Leaf, Zoroastrianism, Universe, Motorcycle, Politician, Museum, Chromosome, Trinity, Samoa, Torah, Hezbollah, Bologna, Bill_Clinton, Death, Rhine, Deforestation, Nickel, Romanization, Vagina, Abraham_Lincoln, Metal, Eucharist, Burundi, Southampton, Akbar, Thermodynamics

901-1000:

Bordeaux, Zeus, Dam, Paleontology, Baroque, Assyria, Passerine, Tomato, Light, Greek_language, Rodent, Habitat, Surrey, Biochemistry, Airport, Hamlet, Saxophone, Murder, Galaxy, Unemployment, Somerset, Basel, RNA, Continent, Benin, Adolescence, Nairobi, Erosion, Cicero, Niger, Aberdeen, Titanium, Brittany, Andes, Family, Rain, Mauritania, Comet, Arabic_language, North_Carolina, Bicycle, Photon, Pop_music, Korean_War, Chicken, Metre, Ganges, EBay, Devon, Wood, Orchidaceae, Kabul, Jersey, Radar, Hamas, Synthpop, Monocotyledon, Odisha, Area, Life, JavaScript, Communication, Refugee, Inflammation, Herodotus, Gabon, Confucianism, PH, Pluto, Kilogram, Aesthetics, Spider-Man, Michelangelo, Nottingham, Amtrak, United_States_dollar, Mercedes-Benz, Flute, Islamabad, Penis, LGBT, Vanuatu, Teacher, Island, Population_density, Ankara, Unix, White, Tin, Chlorine, Zionism, Military, Latitude, Laser, Firefox, IOS, Tuscany, Phosphorus, Comedy, Science_fiction, Research

Other notes

No ranking is perfect, and importance is subjective. Some people will want to have more asteroids or car models, others will want more football players or music albums. However, the above listing is relatively stable — meaning if we adjust the relative weights of various factors, the articles will reshuffle a little, but the list looks basically the same.

Another side effect of ranking Wikipedia articles is that we can evaluate the signal to noise ratio. Very loosely speaking, we believe that approximately half a million Wikipedia articles are solid Encyclopedic topics. The remaining 3.8 million tend to include geographical locations (e.g., a town in Siberia), popular culture artifacts (music albums, old TV shows), lesser companies, politicians and sports figures and other people. Often the lowest-ranking articles were wikispam, and were already removed from Wikipedia by dutiful Wikipedia editors.

 

Leave a Reply

Your email address will not be published. Required fields are marked *