URL
Contents
URL
A URL (an acronym for Uniform Resource Locator) is a standardized text string that specifies how to access a resource on the Internet. It is the most common form of URI and is primarily used to identify the location of web pages, images, files, and other data on the World Wide Web.
Essentially, a URL is an address that tells a web browser or other client exactly where to find something on a network and what protocol to use to retrieve it.
Structure of a URL
A URL is composed of several parts, though not all parts are always present in every URL. The general structure can be broken down as follows:
`scheme://authority/path?query#fragment`
Let's break down the main components:
- Scheme: This is the first part of the URL and specifies the protocol to be used to access the resource. It is followed by a colon (`:`). Common schemes for the web include `http` and `https` (for secure web traffic), but other schemes like `ftp` (for file transfer), `mailto` (for email addresses), or `file` (for local files) also exist.
- Authority: This part is typically preceded by a double slash (`//`) and specifies the location of the server or host where the resource is located. It can itself be broken down into sub-components:
* `[userinfo@]` (Optional): Contains a username and optional password separated by a colon (`:`), followed by an "at" symbol (`@`). This is rarely used in modern web URLs due to security concerns. * Host: The domain name (e.g., `www.example.com`) or IP address (e.g., `192.168.1.1`) of the Web server or other host providing the resource. The browser uses the DNS to translate a hostname into an IP address. * `[:port]` (Optional): The port number on the host to which the client should connect. If omitted, the default port for the specified scheme is used (e.g., 80 for HTTP, 443 for HTTPS).
- Path: This part identifies the specific resource within the host. It consists of a sequence of path segments separated by slashes (`/`), similar to directories and file names in a file system path. It indicates the location of the resource on the server.
- Query: This optional part is preceded by a question mark (`?`) and contains data to be passed to the server, often for dynamic resources. It consists of key-value pairs separated by ampersands (`&`) (e.g., `?search=test&category=web`). The server-side script can access and use this data to generate the response.
- Fragment: This optional part is preceded by a hash symbol (`#`) and specifies a secondary resource or a portion of the primary resource. On web pages, it is often used to link to a specific section within an HTML document (using the ID attribute of an element). The fragment is processed by the browser and is typically not sent to the Web server.
Examples
Let's look at a few examples to illustrate the components:
`https://www.example.com:8080/path/to/resource.html?query=value#section`
- **Scheme:** `https`
- **Authority:** `www.example.com:8080`
* **Host:** `www.example.com` * **Port:** `8080`
- **Path:** `/path/to/resource.html`
- **Query:** `query=value`
- **Fragment:** `section`
`ftp://user:password@ftp.example.com/public/file.txt`
- **Scheme:** `ftp`
- **Authority:** `user:password@ftp.example.com`
* **Userinfo:** `user:password` * **Host:** `ftp.example.com`
- **Path:** `/public/file.txt`
- **Scheme:** `mailto`
- **Authority:** (None in this scheme)
- **Path:** `user@example.com` (The structure varies by scheme)
Relation to URIs and URNs
A URL is a type of URI. The term URI is a more general concept that encompasses both URLs (which specify location) and URNs (Uniform Resource Names), which specify a resource by name without necessarily indicating its location (like an ISBN for a book). All URLs are URIs, but not all URIs are URLs. However, in common usage, "URL" is often used generically to refer to URIs, especially web addresses.
Encoding
[1], also known as percent-encoding, is a mechanism for encoding information in a URL under certain circumstances. It is used to represent characters that are not allowed in a URL (like spaces or certain symbols) or characters that have special meaning within a URL's structure (like `?`, `#`, `/`, `&`, `=`, `:`, `;`, `+`, `,`, `@`, ` `). These characters are replaced by a percent sign (`%`) followed by their two-digit Hexadecimal representation. For example, a space character is encoded as `%20`. This encoding primarily applies to characters within the path, query, and fragment components.
See Also
- URI
- World Wide Web
- HTTP
- DNS
- Web server
- Protocol
- Hostname
- IP address
- Port (computer networking)
- [2]
- Resource
References
- Template:Cite book - By the inventor of the WWW, URLs, HTTP, etc.
- Template:Cite book - Overview of web standards, including URLs.
- RFC 3986: Uniform Resource Identifier (URI): Generic Syntax - The official technical specification document defining URI syntax, which includes URLs.
- MDN Web Docs - URL - Provides a clear explanation and breakdown of URL components.
- Cloudflare - What is a URL? - An online resource explaining URLs for a general audience.