HyperText Markup Language (HTML) is a markup language designed for creating web pages, that is, information presented on the World Wide Web. Defined as a simple "application" of SGML, which is used by organizations with complex publishing requirements, HTML is now an Internet standard maintained by the World Wide Web Consortium (W3C). The most recent version is HTML 4.01, though it has been superseded by XHTML.
HTML generally appears in text files stored on computers connected to the Internet. These files contain markup, that is, instructions for the program on how to display or process the text in plain text format. The files are usually transferred across the internet using the HTTP network protocol, after which their HTML may be displayed by a visual web browser, aural browser (one that reads the text of the page to the user), braille reader (converts pages to a braille format), email client, or a wireless device like a cellular phone.
There are four kinds of markup elements in HTML:
- structural markup that describes the purpose of text (for example,
<h1>Golf</h1>will cause a reader to treat "Golf" as a first-level heading),
- presentational markup that describes the visual appearance of text regardless of its function (for example,
<b>boldface</b>will render boldface text) (Note that presentation markup is deprecated and is not recommended; authors should use CSS for presentation),
- hypertext markup that links parts of the document to other documents (for example,
<a href="http://www.wikipedia.org/">Wikipedia</a>will render the word Wikipedia as a hyperlink to the specified URL), and
- widget elements that create objects (for example, buttons and lists).
Note that, besides the presentational markup, which, to repeat, is deprecated, the markup itself does not determine how the content within that markup will look. Basically, all the markup tells you is what the role of the content is. Is it a paragraph? Is it a heading? Is it a list? Is it a link? It's up to the Cascading Style Sheets to determine what that heading, list, etc will look like.
Separation of style and content
Efforts of the web development community have led to a new thinking in the way a web document should be written; XHTML epitomizes this effort. Standards stress using markup which suggests the structure of the document, like headings, paragraphs, block quoted text, and tables, instead of using markup which is written for visual purposes only, like <font>, <b> (bold), and <i> (italics). Such presentational code has been removed from the HTML 4.01 Strict and XHTML specifications in favor of CSS solutions. CSS provides a way to separate the HTML structure from the content's presentation. See separation of style and content.
The document type definition (DTD)
All HTML documents should start with a Document Type Definition (or DTD) declaration. For example:
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
This defines a document that conformes to the Strict DTD of HTML 4.01, which is purely structural, leaving formatting to Cascading Style Sheets. Other DTDs, including Loose, Transitional, and Frameset, define different rules for the use of the language.
Version history of the standard
- HTML 2.0 — (RFC 1866) approved as a proposed standard September 22, 1995,
- HTML 3.2 — January 14, 1996,
- HTML 4.0 — December 18, 1997,
- HTML 4.01 (minor fixes) — December 24, 1999,
- ISO/IEC 15445:2000 ("ISO HTML", based on HTML 4.01 Strict) — May 15, 2000.
There is no official HTML 1.0 specification because there were multiple informal HTML standards at the time. Work on a successor for HTML, then called 'HTML+', began in late 1993, designed originally to be "A superset of HTML … [which] allows a gradual rollover from the previous format [HTML]" (Dave Raggett, September 1993). The first formal specification was therefore given the version number 2.0 in order to distinguish it from these unofficial "standards". Work on HTML+ continued, but this never became a standard.
The HTML 3.0 standard was proposed by the newly formed W3C in March, 1995, and provided many new capabilities such as support for tables, text flow around figures and the display of complex math elements. Even though it was designed to be compatible with HTML 2.0, it was too complex at the time to be implemented, and when the draft expired in September 1995 it was not continued due to lack of browser support. HTML 3.1 was never officially proposed, and the next standard proposal was HTML 3.2, which had dropped the majority of the new features in HTML 3.0 and had instead adopted many browser-specific elements and attributes which had been created for the Netscape and Mosaic web browsers. Support for math as proposed by HTML 3.0 finally came with the different standard MathML.
HTML 4.0 likewise adopted many browser-specific elements and attributes, but at the same time began to try to 'clean up' the standard, by marking some of them as 'deprecated'.
- ASCII - ISO 8859-1 Table with HTML Entity Names
- W3C's HTML Validator
- HTML/XHTML Validator Project on SourceForge
- HTML Tag Reference and Tutorials
- HTML Discussion Forum
- The Importance of HTML Validation
- Programming:HTML - Wikibooks
- HTML+ Discussion Document (obsolete)
- NCSA's Beginner's Guide to HTML
- HTML Code Tutorial
- The "head" part of an HTML document
- HTML: An Interactive Tutorial for Beginners