The above requirements apply in XML documents as well.
Here a blog uses the srcdoc attribute in conjunction
with the sandbox attribute described below to provide
users of user agents that support this feature with an extra layer of protection from script
injection in the blog post comments:
<article><h1>I got my own magazine!</h1><p>After much effort, I've finally found a publisher, and so now I
have my own magazine! Isn't that awesome?! The first issue will come
out in September, and we have articles about getting food, and about
getting in boxes, it's going to be great!</p><footer><p>Written by <ahref="/users/cap">cap</a>, 1 hour ago.
</footer><article><footer> Thirteen minutes ago, <ahref="/users/ch">ch</a> wrote: </footer><iframesandboxsrcdoc="<p>did you get a cover picture yet?"></iframe></article><article><footer> Nine minutes ago, <ahref="/users/cap">cap</a> wrote: </footer><iframesandboxsrcdoc="<p>Yeah, you can see it <a href="/gallery?mode=cover&amp;page=1">in my gallery</a>."></iframe></article><article><footer> Five minutes ago, <ahref="/users/ch">ch</a> wrote: </footer><iframesandboxsrcdoc="<p>hey that's earl's table.<p>you should get earl&amp;me on the next cover."></iframe></article>
Notice the way that quotes have to be escaped (otherwise the srcdoc attribute would end prematurely), and the way raw
ampersands (e.g. in URLs or in prose) mentioned in the sandboxed content have to be
doubly escaped — once so that the ampersand is preserved when originally parsing
the srcdoc attribute, and once more to prevent the
ampersand from being misinterpreted when parsing the sandboxed content.
Furthermore, notice that since the DOCTYPE is optional in
iframesrcdoc documents, and the html,
head, and body elements have optional
start and end tags, and the title element is also optional in iframesrcdoc
documents, the markup in a srcdoc attribute can be
relatively succinct despite representing an entire document, since only the contents of the
body element need appear literally in the syntax. The other elements are still
present, but only by implication.
In the HTML syntax, authors need only remember to use U+0022
QUOTATION MARK characters (“) to wrap the attribute contents and then to escape all U+0026
AMPERSAND (&) and U+0022 QUOTATION MARK (“) characters, and to specify the sandbox attribute, to ensure safe embedding of content. (And
remember to escape ampersands before quotation marks, to ensure quotation marks become "
and not &quot;.)
In XML the U+003C LESS-THAN SIGN character (<) needs to be escaped as well. In
order to prevent attribute-value
normalization, some of XML’s whitespace characters — specifically U+0009 CHARACTER
TABULATION (tab), U+000A LINE FEED (LF), and U+000D CARRIAGE RETURN (CR) — also need to be
escaped. [XML]
If the src attribute and the srcdoc attribute are both specified together, the srcdoc attribute takes priority. This allows authors to provide
a fallback URL for legacy user agents that do not support the srcdoc attribute.
Although iframes are processed while in a shadow tree,
per the above, several other aspects of their behavior are not well-defined with regards to
shadow trees. See issue #763 for more
detail.
This is necessary in case url is something like about:blank?foo. If url is just plain about:blank, this will do nothing.
Return url.
To navigate an iframe or frame given an element
element, a URLurl, a referrer policyreferrerPolicy, an optional string-or-null srcdocString (default
null), and an optional boolean initialInsertion (default false):
Each Document has an iframe load in progress flag and a mute
iframe load flag. When a Document is created, these flags must be unset for
that Document.
To run the iframe load event steps, given an iframe element
element:
This, in conjunction with scripting, can be used to probe the URL space of the
local network’s HTTP servers. User agents may implement cross-origin
access control policies that are stricter than those described above to mitigate this attack, but
unfortunately such policies are typically not compatible with existing web content.
If an element type potentially delays the load event, then for each element
element of that type, the user agent must delay the load event of
element‘s node document if element‘s content
navigable is non-null and any of the following are true:
Each iframe element has an associated current navigation was lazy
loaded boolean, initially false. It is set and unset in the process the
iframe attributes algorithm.
Setting both the allow-scripts and allow-same-origin keywords together when the
embedded page has the same origin as the page containing the iframe
allows the embedded page to simply remove the sandbox
attribute and then reload itself, effectively breaking out of the sandbox altogether.
These flags only take effect when the content navigable of the
iframe element is navigated. Removing them, or
removing the entire sandbox attribute, has no effect on
an already-loaded page.
Potentially hostile files should not be served from the same server as the file
containing the iframe element. Sandboxing hostile content is of minimal help if an
attacker can convince the user to just visit the hostile content directly, rather than in the
iframe. To limit the damage that can be caused by hostile HTML content, it should be
served from a separate dedicated domain. Using a different domain ensures that scripts in the
files are unable to attack the site, even if the user is tricked into visiting those pages
directly, without the protection of the sandbox
attribute.
In this example, some completely-unknown, potentially hostile, user-provided HTML content is
embedded in a page. Because it is served from a separate domain, it is affected by all the normal
cross-site restrictions. In addition, the embedded page has scripting disabled, plugins disabled,
forms disabled, and it cannot navigate any frames or windows other than itself (or any frames or
windows it itself embeds).
<p>We're not scared of you! Here is your content, unedited:</p><iframesandboxsrc="https://usercontent.example.net/getusercontent.cgi?id=12193"></iframe>
It is important to use a separate domain so that if the attacker convinces the
user to visit that page directly, the page doesn’t run in the context of the site’s origin, which
would make the user vulnerable to any attack found in the page.
In this example, a gadget from another site is embedded. The gadget has scripting and forms
enabled, and the origin sandbox restrictions are lifted, allowing the gadget to communicate with
its originating server. The sandbox is still useful, however, as it disables plugins and popups,
thus reducing the risk of the user being exposed to malware and other annoyances.
For this example, suppose all the files were served as text/html.
Page C in this scenario has all the sandboxing flags set. Scripts are disabled, because the
iframe in A has scripts disabled, and this overrides the allow-scripts keyword set on the
iframe in B. Forms are also disabled, because the inner iframe (in B)
does not have the allow-forms keyword
set.
Suppose now that a script in A removes all the sandbox attributes in A and B.
This would change nothing immediately. If the user clicked the link in C, loading page D into
the iframe in B, page D would now act as if the iframe in B had the
allow-same-origin and allow-forms keywords set, because that was the
state of the content navigable in the iframe in A when page B was
loaded.
Generally speaking, dynamically removing or changing the sandbox attribute is ill-advised, because it can make it quite
hard to reason about what will be allowed and what will not.
In this example, an iframe is used to embed a map from an online navigation
service. The allow attribute is used to enable the
Geolocation API within the nested context.
Here, an iframe is used to embed a player from a video site. The allowfullscreen attribute is needed to enable the
player to show its video fullscreen.
<article><header><p><imgsrc="/usericons/1627591962735"><b>Fred Flintstone</b></p><p><ahref="/posts/3095182851"rel=bookmark>12:44</a> — <ahref="#acl-3095182851">Private Post</a></p></header><p>Check out my new ride!</p><iframesrc="https://video.example.com/embed?id=92469812"allowfullscreen></iframe></article>
The iframe element supports dimension attributes for cases where the
embedded content has specific dimensions (e.g. ad units have well-defined dimensions).
Descendants of iframe elements represent nothing. (In legacy user agents that do
not support iframe elements, the contents would be parsed as markup that could act as
fallback content.)
If the itemprop attribute is specified on an
embed element, then the src attribute must also
be specified.
The type attribute,
if present, gives the MIME type by which the plugin to instantiate is selected. The
value must be a valid MIME type string. If both the type attribute and the src
attribute are present, then the type attribute must specify
the same type as the explicit Content-Type metadata of the
resource given by the src attribute.
While any of the following conditions are occurring, any plugin instantiated for
the element must be removed, and the embed element represents
nothing:
The element has neither a src attribute nor a type attribute.
It is intentional that the above algorithm allows response to have a
non-ok status. This allows servers to return data for plugins even with error
responses (e.g., HTTP 500 Internal Server Error codes can still contain plugin data).
To display no plugin for an embed element element:
Depending on the type of content instantiated by the
object element, the node also supports other
interfaces.
The object element can represent an external resource, which, depending on the
type of the resource, will either be treated as an image or as a child
navigable.
If the user has indicated a preference that this object element’s fallback
content be shown instead of the element’s usual behavior, then jump to the step below
labeled fallback.
For example, a user could ask for the element’s fallback content to
be shown because that content uses a format that the user finds more accessible.
If the data attribute is present and its value is
not the empty string, then:
If the type attribute is present and its value is
not a type that the user agent supports, then the user agent may jump to the step below labeled
fallback without fetching the content to examine its real type.
If the resource is not yet available (e.g. because the resource was not available in the
cache, so that loading the resource required making a request over the network), then jump to
the step below labeled fallback. The task that is
queued by the networking task source once the
resource is available must restart this algorithm from this step. Resources can load
incrementally; user agents may opt to consider a resource “available” whenever enough data has
been obtained to begin processing the resource.
If the load failed (e.g. there was an HTTP 404 error, there was a DNS error), fire an event named error
at the element, then jump to the step below labeled fallback.
Determine the resource type, as follows:
Let the resource type be unknown.
If the user agent is configured to strictly obey Content-Type headers for this resource,
and the resource has associated Content-Type metadata,
then let the resource type be the type specified in the resource’s Content-Type metadata, and jump to the step below
labeled handler.
This can introduce a vulnerability, wherein a site is trying to embed a
resource that uses a particular type, but the remote site overrides that and instead
furnishes the user agent with a resource that triggers a different type of content with different
security characteristics.
Run the appropriate set of steps from the following
list:
If binary is false, then let the resource
type be the type specified in the resource’s
Content-Type metadata, and jump to the step below labeled handler.
If there is a type attribute present on the
object element, and its value is not application/octet-stream,
then run the following steps:
If the attribute’s value is a type that starts with “image/” that is
not also an XML MIME type, then let the resource type be the
type specified in that type attribute.
If tentative type is notapplication/octet-stream, then let resource type be
tentative type and jump to the step below labeled
handler.
If applying the URL parser algorithm to the URL of the
specified resource (after any redirects) results in a URL record whose path component matches a pattern that a plugin
supports, then let resource type be the type that that plugin can
handle.
For example, a plugin might say that it can handle resources with path components that end with the four character string
“.swf“.
It is possible for this step to finish, or for one of the substeps above to
jump straight to the next step, with resource type still being unknown. In
both cases, the next step will trigger fallback.
Handler: Handle the content as given by the first of the following cases that
matches:
If the resource type is an XML MIME type, or if the resource type
does not start with “image/“
Due to the algorithm above, the contents of object elements act as fallback
content, used only when referenced resources can’t be shown (e.g. because it returned a 404
error). This allows multiple object elements to be nested inside each other,
targeting multiple user agents with different capabilities, with the user agent picking the first
one it supports.