Chrome扩展:内容脚本

Content scripts are JavaScript files that run in the context of web pages. By using the standard Document Object Model (DOM), they can read details of the web pages the browser visits, or make changes to them.

Here are some examples of what content scripts can do:

  • Find unlinked URLs in web pages and convert them into hyperlinks
  • Increase the font size to make text more legible
  • Find and process microformat data in the DOM

However, content scripts have some limitations. They cannot:

These limitations aren't as bad as they sound. Content scripts can indirectly use the chrome.* APIs, get access to extension data, and request extension actions by exchanging messages with their parent extension. Content scripts can also make cross-site XMLHttpRequests to the same sites as their parent extensions, and they cancommunicate with web pages using the shared DOM. For more insight into what content scripts can and can't do, learn about the execution environment.

If your content script's code should always be injected, register it in the extension manifest using the content_scripts field, as in the following example.

{
"name": "My extension",
...
"content_scripts": [
{
"matches": ["http://www.google.com/*"],
"css": ["mystyles.css"],
"js": ["jquery.js", "myscript.js"]

}

],

...

}

If you want to inject the code only sometimes, use the permissions field instead, as described in Programmatic injection.

{
"name": "My extension",
...
"permissions": [
"tabs", "http://www.google.com/*"
],

...

}

Using the content_scripts field, an extension can insert multiple content scripts into a page; each of these content scripts can have multiple JavaScript and CSS files. Each item in the content_scripts array can have the following properties:

Name Type Description
matches array of strings Required. Specifies which pages this content script will be injected into. See Match Patterns for more details on the syntax of these strings and Match patterns and globs for information on how to exclude URLs.
exclude_matches array of strings Optional. Excludes pages that this content script would otherwise be injected into. See Match Patternsfor more details on the syntax of these strings and Match patterns and globs for information on how to exclude URLs.
match_about_blank boolean Optional. Whether to insert the content script on about:blank and about:srcdoc. Content scripts will only be injected on pages when their inherit URL is matched by one of the declared patterns in thematches field. The inherit URL is the URL of the document that created the frame or window.
Content scripts cannot be inserted in sandboxed frames.Defaults to false.
css array of strings Optional. The list of CSS files to be injected into matching pages. These are injected in the order they appear in this array, before any DOM is constructed or displayed for the page.
js array of strings Optional. The list of JavaScript files to be injected into matching pages. These are injected in the order they appear in this array.
run_at string Optional. Controls when the files in js are injected. Can be "document_start", "document_end", or "document_idle". Defaults to "document_idle".

In the case of "document_start", the files are injected after any files from css, but before any other DOM is constructed or any other script is run.

In the case of "document_end", the files are injected immediately after the DOM is complete, but before subresources like images and frames have loaded.

In the case of "document_idle", the browser chooses a time to inject scripts between "document_end" and immediately after thewindow.onloadevent fires. The exact moment of injection depends on how complex the document is and how long it is taking to load, and is optimized for page load speed.

Note: With "document_idle", content scripts may not necessarily receive the window.onload event, because they may run after it has already fired. In most cases, listening for the onload event is unnecessary for content scripts running at "document_idle" because they are guaranteed to run after the DOM is complete. If your script definitely needs to run after window.onload, you can check if onload has already fired by using the document.readyState property.

all_frames boolean Optional. Controls whether the content script runs in all frames of the matching page, or only the top frame.

Defaults to false, meaning that only the top frame is matched.

include_globs array of string Optional. Applied after matches to include only those URLs that also match this glob. Intended to emulate the @include Greasemonkey keyword. See Match patterns and globs below for more details.
exclude_globs array of string Optional. Applied after matches to exclude URLs that match this glob. Intended to emulate the @excludeGreasemonkey keyword. See Match patterns and globs below for more details.

The content script will be injected into a page if its URL matches any matches pattern and any include_globspattern, as long as the URL doesn't also match an exclude_matches or exclude_globs pattern. Because the matches property is required, exclude_matches, include_globs, and exclude_globs can only be used to limit which pages will be affected.

For example, assume matches is ["http://*.nytimes.com/*"]:

  • If exclude_matches is ["*://*/*business*"], then the content script would be injected into "http://www.nytimes.com/health" but not into "http://www.nytimes.com/business".
  • If include_globs is ["*nytimes.com/???s/*"], then the content script would be injected into "http:/www.nytimes.com/arts/index.html" and "http://www.nytimes.com/jobs/index.html" but not into "http://www.nytimes.com/sports/index.html".
  • If exclude_globs is ["*science*"], then the content script would be injected into "http://www.nytimes.com" but not into "http://science.nytimes.com" or "http://www.nytimes.com/science".

Glob properties follow a different, more flexible syntax than match patterns. Acceptable glob strings are URLs that may contain "wildcard" asterisks and question marks. The asterisk (*) matches any string of any length (including the empty string); the question mark (?) matches any single character.

For example, the glob "http://???.example.com/foo/*" matches any of the following:

  • "http://www.example.com/foo/bar"
  • "http://the.example.com/foo/"

However, it does not match the following:

  • "http://my.example.com/foo/bar"
  • "http://example.com/foo/"
  • "http://www.example.com/foo"

Inserting code into a page programmatically is useful when your JavaScript or CSS code shouldn't be injected into every single page that matches the pattern — for example, if you want a script to run only when the user clicks a browser action's icon.

To insert code into a page, your extension must have cross-origin permissions for the page. It also must be able to use the chrome.tabs module. You can get both kinds of permission using the manifest file's permissions field.

Once you have permissions set up, you can inject JavaScript into a page by calling tabs.executeScript. To inject CSS, use tabs.insertCSS.

The following code (from the make_page_red example) reacts to a user click by inserting JavaScript into the current tab's page and executing the script.

chrome.browserAction.onClicked.addListener(function(tab) {
  chrome.tabs.executeScript({
    code: 'document.body.style.backgroundColor="red"' 
  }); 
});
"permissions": [ 
    "activeTab" 
],

When the browser is displaying an HTTP page and the user clicks this extension's browser action, the extension sets the page's bgcolor property to 'red'. The result, unless the page has CSS that sets the background color, is that the page turns red.

Usually, instead of inserting code directly (as in the previous sample), you put the code in a file. You inject the file's contents like this:

chrome.tabs.executeScript(null, {file: "content_script.js"});

Content scripts execute in a special environment called an isolated world. They have access to the DOM of the page they are injected into, but not to any JavaScript variables or functions created by the page. It looks to each content script as if there is no other JavaScript executing on the page it is running on. The same is true in reverse: JavaScript running on the page cannot call any functions or access any variables defined by content scripts.

For example, consider this simple page:

<html>
  <button id="mybutton">click me</button>
  <script>
    var greeting = "hello, ";
    var button = document.getElementById("mybutton");
    button.person_name = "Bob";
    button.addEventListener("click", function() {
      alert(greeting + button.person_name + ".");
    }, false);
  </script>
</html>

Now, suppose this content script was injected into hello.html:

var greeting = "hola, ";
var button = document.getElementById("mybutton");
button.person_name = "Roberto";
button.addEventListener("click", function() {
  alert(greeting + button.person_name + ".");
}, false);

Now, if the button is pressed, you will see both greetings.

Isolated worlds allow each content script to make changes to its JavaScript environment without worrying about conflicting with the page or with other content scripts. For example, a content script could include JQuery v1 and the page could include JQuery v2, and they wouldn't conflict with each other.

Another important benefit of isolated worlds is that they completely separate the JavaScript on the page from the JavaScript in extensions. This allows us to offer extra functionality to content scripts that should not be accessible from web pages without worrying about web pages accessing it.

It's worth noting what happens with JavaScript objects that are shared by the page and the extension - for example, the window.onload event. Each isolated world sees its own version of the object. Assigning to the object affects your independent copy of the object. For example, both the page and extension can assign to window.onload, but neither one can read the other's event handler. The event handlers are called in the order in which they were assigned.

Although the execution environments of content scripts and the pages that host them are isolated from each other, they share access to the page's DOM. If the page wishes to communicate with the content script (or with the extension via the content script), it must do so through the shared DOM.

An example can be accomplished using window.postMessage (or window.webkitPostMessage for Transferable objects):

var port = chrome.runtime.connect();

window.addEventListener("message", function(event) {
  // We only accept messages from ourselves
  if (event.source != window)
    return;

  if (event.data.type && (event.data.type == "FROM_PAGE")) {
    console.log("Content script received: " + event.data.text);
    port.postMessage(event.data.text);
  }
}, false);
document.getElementById("theButton").addEventListener("click",
    function() {
  window.postMessage({ type: "FROM_PAGE", text: "Hello from the webpage!" }, "*");
}, false);

In the above example, example.html (which is not a part of the extension) posts messages to itself, which are intercepted and inspected by the content script, and then posted to the extension process. In this way, the page establishes a line of communication to the extension process. The reverse is possible through similar means.

When writing a content script, you should be aware of two security issues. First, be careful not to introduce security vulnerabilities into the web site your content script is injected into. For example, if your content script receives content from another web site (for example, by making an XMLHttpRequest), be careful to filter that content for cross-site scripting attacks before injecting the content into the current page. For example, prefer to inject content via innerText rather than innerHTML. Be especially careful when retrieving HTTP content on an HTTPS page because the HTTP content might have been corrupted by a network "man-in-the-middle" if the user is on a hostile network.

Second, although running your content script in an isolated world provides some protection from the web page, a malicious web page might still be able to attack your content script if you use content from the web page indiscriminately. For example, the following patterns are dangerous:

var data = document.getElementById("json-data")
// WARNING! Might be evaluating an evil script!
var parsed = eval("(" + data + ")")
var elmt_id = ...
// WARNING! elmt_id might be "); ... evil script ... //"!
window.setTimeout("animate(" + elmt_id + ")", 200);

Instead, prefer safer APIs that do not run scripts:

var data = document.getElementById("json-data")
// JSON.parse does not evaluate the attacker's scripts.
var parsed = JSON.parse(data);
var elmt_id = ...
// The closure form of setTimeout does not evaluate scripts.
window.setTimeout(function() {
  animate(elmt_id);
}, 200);

Get the URL of an extension's file using chrome.extension.getURL(). You can use the result just like you would any other URL, as the following code shows.

//Code for displaying <extensionDir>/images/myimage.png:
var imgURL = chrome.extension.getURL("images/myimage.png");
document.getElementById("someImage").src = imgURL;

You can find many examples that use content scripts. A simple example of communication via messages is in the Message Timer. See Page Redder and Email This Page for examples of programmatic injection.

The following videos discuss concepts that are important for content scripts. The first video describes content scripts and isolated worlds.

The next video describes message passing, featuring an example of a content script sending a request to its parent extension.

原文链接


Content Scripts

发布者

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注