Commons:Category structure

From Wikimedia Commons, the free media repository
Revision as of 17:29, 10 June 2011 by Rd232 (talk | contribs) (+)
Jump to navigation Jump to search
The below is a draft; everything except "Overview" is simply split from Commons:Categories. Post-split Commons:Categories would look like this.

The category structure is the primary way to organize and find files on the Commons. It is essential that every file can be found by browsing the category structure. To allow this, each file must be put into a category directly. Each category should itself be in more general categories, forming a hierarchical structure.

Overview

  • The general rule is to always place an image in the most specific categories, and not in the levels above those.
  • Category names should generally be in English.
  • Category names that refer to types of objects or groups of people should generally be in plural form, eg Category:Tools, as opposed to general themes or activities such as Category:History
  • Where a category exists for a specific subject (example: Category:Mohandas K. Gandhi), image files within it should not be listed directly in any categories which that category is (or should be) listed in, such as Category:People of British India. However categories that are more specific than the main subject category (eg Category:1932 in India, Category:People of India in 1946) should be applied to an image or to an appropriate subcategory.
  • To help Wikipedia users navigate the category structure, it is very helpful to provide appropriate links from the relevant Wikipedia pages and categories. Many Wikipedia language versions have templates for the purpose, often called "commons" and "commonscat", which can be added into the relevant pages and categories with {{Commons}} and {{Commonscat}}.

Category structure in Wikimedia Commons

The category structure is (ideally) a multi-hierarchy with a single root category, Category:CommonsRoot.

  • All categories (except CommonsRoot) should be contained in at least one other category
  • There should be no cycles (i.e. a category should not contain itself, directly or indirectly).
  • The category structure should reflect a hierarchy of concepts, from the most generic one down to the very specific.

Major categories

The top-most categories (the ones contained directly in CommonsRoot) divide the category structure by the purpose of the contained categories:

  • Category:Topics - This category is the global common root of the media files categorized by the TOPIC. ALL media files should be categorized under this category for the sake of allowing others to find them by topic. Topical categories shouldn't be included through templates.
  • Category:Copyright statuses - This category is the global common root of the media files categorized by the LICENSE. ALL media files should be categorized under this category with the appropriate license tag. This type of category is added by including it in the templates.
  • Category:Image sources - This category is the global common root of the media files categorized by the SOURCE, where they come from (books, collections, sites, etc.). This type of category is generally added by template.
  • Category:Media types - This category is the global common root of the media files categorized by the Media TYPE. Please note that this type of categorization is sometimes omitted for images, since the vast majority of files on the commons are images of some sort.
  • Category:Commons - This category is the global common root of categorizing Commons' maintenance tasks and pages (Commons:-, and Help:-) except for media files. The translated pages in each language should be categorized under their language categories, using the "Category:Commons-ISO-LANGUAGE-CODE" style. The structure of Category:Commons-en is the sample hierarchy for every other language sub category. Do not use two colons in category or page names. See this discussion and Help:Namespaces.
There is a sub category Category:Commons maintenance content, which is for the special maintenance of Wikimedia Commons' global common contents and which does not get translated. ALL media files should be categorized under the first 4 categories below, but ONLY files having problems and needing to be fixed should also be in the sub-category Category:Commons maintenance content.
  • Category:Users - this is for categories that contain commons users galleries, images and texts, sorted by things like the language they speak. This also contains the Category:User galleries, which is for user specific (i.e. non-topic) galleries that don't need to be in English language.

How to use categories

You should always put your uploads into categories and/or gallery pages according to topic, so your contributions can be found and used by others.

It is rarely necessary to create a new category (there are exceptions, such as uploading a new text and see People below). Before doing so, make sure you are familiar with the existing category structure, and with the customs and policies of the Commons. Please see if there exists a category scheme or a commons project for your topic, and follow the conventions described there.

Category names

Category names should generally be in English (see Commons:Language policy). However there are exceptions.

Category names that refer to types of objects or groups of people should generally be in plural form: Category:Tools, Category:Artists, Category:Lakes, Category:Paintings, Category:Sculptures etc, as opposed to general themes or activities such as (Category:History, Category:Weather, Category:Music, Category:Painting, Category:Sculpture) or to a particular individual object (a specific building, monument, artwork etc.). See a proposal of Naming categories for more information.

Categories grouping subcategories by name should generally be named "by name" rather than "by alphabet" (e.g. Category:Ships by name).

We still lack internationalization for category names, but this issue should be resolved with appropriate changes to the MediaWiki software (see bugzilla:5638). Creating intermingled category structures in different languages would only make things worse.

For a general discussion of MediaWiki's category feature, see the manual page on categories.

For more appropriate categorization

Pages (including category pages) are categorized according to their subject, and not to their contents, because the contents are generally not a permanent feature of the category page; in particular, you can momentarily find inappropriate contents in a category page.

Example: Assume that Category:Spheres contains only pictures of crystal balls. You must not add Category:Glass in the category page, according to the current contents, because you can have spheres made with a great variety of materials. Normally, any picture showing a glass object would be already categorized in Category:Glass (or in a category of its substructure). So, if the Category:Spheres is really crowded with crystal balls pictures, it would be a better idea to create a new category page, like Category:Glass spheres or Category:Crystal balls, categorized in Category:Spheres and Category:Glass.

Generally files should only be in the most specific category that exists for certain topic. For example files in Category:Looking up the center of the Eiffel Tower should not also be in Category:Paris (see over-categorization below). If you do not find a category that fits your purpose, you can create it — but carefully read the section about using categories first.

This does not mean that an image only belongs in one category; it just means that images should not be in redundant or non-specific categories. For instance, an image of a Polar Bear being rescued from an iceberg by a helicopter should be in Category:Ursus maritimus, Category:Icebergs, Category:Helicopters, and Category:Search and rescue. It should not, however, be in Category:Ursidae or Category:Aircraft.

Categorization tips

The categories (or galleries) you choose for your uploads should answer as many as possible of the following questions:


The above questions cover the main aspects of the image to be categorized. For some images it makes sense to use all, for other images only one or two are reasonable. In addition there are several other aspects of the images that can be used to categorize the image:

This last set is useful and important but should always be done in addition of the main set of criteria.

Find an appropriate category

To find appropriate categories for your uploads, you should navigate the category structure starting from a generic category. Narrow your search down to subcategories until you find the most specific category that fits the file you uploaded. You can navigate the category structure by following links to subcategories, or expanding the tree of subcategories by clicking on the little + symbols on subcategory names. The Major categories section above provides a starting point, and the How to categorize: guidance by topic covers some topics more. You can also try CommonSense, a tool that is designed to help with categorization based on keywords.

Over-categorization

Don't place an item into a category and its parent (e.g., Put it in 'Black and white photographs of Tour Eiffel', not in 'Black and white photographs of Tour Eiffel' and 'Paris')
Shortcut

Over-categorization is what happens when an image is placed in several categories within the same tree. The general rule is always place an image in the most specific categories, and not in the levels above those. An example:

We'll assume that yellow spheres are spheres with a yellow color. We can think about Category:Yellow spheres and Category:Spheres. The picture to be categorized shows yellow marbles. We categorize the file in Category:Yellow spheres. Now, if we also categorize the image file in Category:Spheres, this is over-categorization: because we already know that the yellow marbles are spheres. This applies to most images: As mentioned above files in Category:Black and white photographs of Tour Eiffel should not also be in Category:Paris, files in Category:Albert Einstein should not be in Category:Physicists from Germany and so on.

Visually, it is the same problem as the red arrow shown in the chapter above.

Why is over-categorization a problem

It's often assumed that the more categories an image is in, the easier it will be to find it. Another example: By that logic, every image showing a man should be in Category:Men, because even if you know nothing more about the person you're looking for than that he is a man, you'll be able to find it. The result is that the top category fills up, making it necessary to go through hundreds, or in this case more likely thousands of images to find the one you want. You probably won't find what you're looking for, and what's more, those who are looking for a generic picture of a man to illustrate an article like en:Man will find that they've drowned out among the movie stars, scientists and politicians.

On lower levels, the problem becomes less acute, since the number of images will be smaller — they can still easily reach into the hundreds, though. But there is still a problem: Let's go back to Einstein. I know that he's a physicist, so I'll look there. I find an image among the hundreds in the category, which I'm not too happy with, but it's the only one there. Since there was an image there, I assume that there are no more hidden elsewhere, rather than look further in Category:Physicists from Germany and thus find Category:Albert Einstein where there might be a better one. So over-categorization has led to two problems: The top category is cluttered, and users will stop looking for the most relevant category since they've reached one that has a relevant image.

Improper categorization of categories is a cause of over-categorization

Strange as it may sound, under-categorization can actually be a cause of over-categorization. This happens when a category is not properly categorized, leading users to over-categorize an image to get it into the relevant categories. An example of this: Category:Eivør Pálsdóttir was categorized only in Category:People by name. So if I add an image of her, and know who she is, I would also place the image in Category:People of the Faroe Islands and Category:Vocalists. This is over-categorizing, I've caused clutter in the top categories by adding images directly to them.

A related problem is erroneous categorization: Category:Notting Hill was for more than a month placed in Category:London. When adding an image, it would be very tempting to add that image to Category:Royal Borough of Kensington and Chelsea, which is where you'll find Notting Hill. Instead, each image should be placed only in the most specific categories, and those categories should in turn be placed in their most specific categories.

When you encounter this, please categorize the categories properly if you are able to do so. That will not only help avoid over-categorization, but also make it easier to move through the category tree.

How to categorize: guidance by topic

For some categories, there is special guidance on how best to sort content within that category. This guidance can be found in a category scheme or a commons project for your topic. There is also some categorizing information in this section and sometimes there is guidance at the top of the category's page, in the Category namespace. So, for instance, some guidance on categorizing content depicting people is at the top of Category:People, and some is in the section People below.

People

Content depicting people can be put in categories and/or galleries which describe them, such as Category:Economists from the United States. Start exploring at Category:People.

Please see Commons:Category scheme People for details on how to name and organize these categories.

Landscapes, outdoor views

A
> Views of B from A
B
> Views of B from A
A
> Views from A
> Views of B from A
B
> B skylines (or similar)
> B skylines from A

If there are series of similar views of "B", these can be categorized in a category "View of B from A". This category should be a subcategory of both "A" and "B".

Sometimes it makes sense to have an intermediate category in one or the other hierarchy: e.g. Category:Views from the Empire State Building or Category:Seattle skylines.

Texts

Texts, such as scans of books, should normally have a category for each version of the scan and each edition of the text. Thus a book published in three separate editions would have a parent category for the book, three subcategories for each text, and further subcategories for the text as a jpeg, a DjVu, etc. Assuming each version had actually been uploaded (categories would not be created for editions not held on Commons). This is particularly important for files in formats other than DjVu and PDF, where the category is the only practical means of keeping the scans together; see eg. Category:The Chronicles of England, Scotland and Ireland, Holinshed, 1587 which contains 2857 jpeg images of page scans.