Skip to content

Commit

Permalink
Fix some typos, whitespace and small errors in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
allait committed Feb 27, 2014
1 parent 2e59d77 commit 210a0a6
Show file tree
Hide file tree
Showing 5 changed files with 55 additions and 55 deletions.
28 changes: 14 additions & 14 deletions docs/intro/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,18 +65,18 @@ Defining our Item
=================

`Items` are containers that will be loaded with the scraped data; they work
like simple python dicts but provide additional protecting against populating
like simple python dicts but provide additional protection against populating
undeclared fields, to prevent typos.

They are declared by creating an :class:`scrapy.item.Item` class and defining
They are declared by creating a :class:`scrapy.item.Item` class and defining
its attributes as :class:`scrapy.item.Field` objects, like you will in an ORM
(don't worry if you're not familiar with ORMs, you will see that this is an
easy task).

We begin by modeling the item that we will use to hold the sites data obtained
from dmoz.org, as we want to capture the name, url and description of the
sites, we define fields for each of these three attributes. To do that, we edit
items.py, found in the ``tutorial`` directory. Our Item class looks like this::
``items.py``, found in the ``tutorial`` directory. Our Item class looks like this::

from scrapy.item import Item, Field

Expand All @@ -86,7 +86,7 @@ items.py, found in the ``tutorial`` directory. Our Item class looks like this::
desc = Field()

This may seem complicated at first, but defining the item allows you to use other handy
components of Scrapy that need to know how your item looks like.
components of Scrapy that need to know how your item looks.

Our first Spider
================
Expand All @@ -97,8 +97,8 @@ of domains).
They define an initial list of URLs to download, how to follow links, and how
to parse the contents of those pages to extract :ref:`items <topics-items>`.

To create a Spider, you must subclass :class:`scrapy.spider.Spider`, and
define the three main, mandatory, attributes:
To create a Spider, you must subclass :class:`scrapy.spider.Spider` and
define the three main mandatory attributes:

* :attr:`~scrapy.spider.Spider.name`: identifies the Spider. It must be
unique, that is, you can't set the same name for different Spiders.
Expand Down Expand Up @@ -162,7 +162,7 @@ will get an output similar to this::
Pay attention to the lines containing ``[dmoz]``, which corresponds to our
spider. You can see a log line for each URL defined in ``start_urls``. Because
these URLs are the starting ones, they have no referrers, which is shown at the
end of the log line, where it says ``(referer: <None>)``.
end of the log line, where it says ``(referer: None)``.

But more interesting, as our ``parse`` method instructs, two files have been
created: *Books* and *Resources*, with the content of both URLs.
Expand Down Expand Up @@ -210,15 +210,15 @@ XPath expressions are indeed much more powerful. To learn more about XPath we
recommend `this XPath tutorial <http://www.w3schools.com/XPath/default.asp>`_.

For working with XPaths, Scrapy provides a :class:`~scrapy.selector.Selector`
class, it is instantiated with a :class:`~scrapy.http.HtmlResponse` or
class, which is instantiated with a :class:`~scrapy.http.HtmlResponse` or
:class:`~scrapy.http.XmlResponse` object as first argument.

You can see selectors as objects that represent nodes in the document
structure. So, the first instantiated selectors are associated to the root
structure. So, the first instantiated selectors are associated with the root
node, or the entire document.

Selectors have four basic methods (click on the method to see the complete API
documentation).
documentation):

* :meth:`~scrapy.selector.Selector.xpath`: returns a list of selectors, each of
them representing the nodes selected by the xpath expression given as
Expand Down Expand Up @@ -275,7 +275,7 @@ After the shell loads, you will have the response fetched in a local
``response`` variable, so if you type ``response.body`` you will see the body
of the response, or you can type ``response.headers`` to see its headers.

The shell also pre-instantiate a selector for this response in variable ``sel``,
The shell also pre-instantiates a selector for this response in variable ``sel``,
the selector automatically chooses the best parsing rules (XML vs HTML) based
on response's type.

Expand Down Expand Up @@ -327,7 +327,7 @@ And the sites links::

sel.xpath('//ul/li/a/@href').extract()

As we said before, each ``.xpath()`` call returns a list of selectors, so we can
As we've said before, each ``.xpath()`` call returns a list of selectors, so we can
concatenate further ``.xpath()`` calls to dig deeper into a node. We are going to use
that property here, so::

Expand Down Expand Up @@ -418,7 +418,7 @@ scraped so far, the final code for our Spider would be like this::
.. note:: You can find a fully-functional variant of this spider in the dirbot_
project available at https://github.com/scrapy/dirbot

Now doing a crawl on the dmoz.org domain yields ``DmozItem``'s::
Now doing a crawl on the dmoz.org domain yields ``DmozItem`` objects::

[dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'desc': [u' - By David Mertz; Addison Wesley. Book in progress, full text, ASCII format. Asks for feedback. [author website, Gnosis Software, Inc.\n],
Expand All @@ -445,7 +445,7 @@ However, if you want to perform more complex things with the scraped items, you
can write an :ref:`Item Pipeline <topics-item-pipeline>`. As with Items, a
placeholder file for Item Pipelines has been set up for you when the project is
created, in ``tutorial/pipelines.py``. Though you don't need to implement any item
pipeline if you just want to store the scraped items.
pipelines if you just want to store the scraped items.

Next steps
==========
Expand Down
22 changes: 11 additions & 11 deletions docs/topics/commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Command line tool
.. versionadded:: 0.10

Scrapy is controlled through the ``scrapy`` command-line tool, to be referred
here as the "Scrapy tool" to differentiate it from their sub-commands which we
just call "commands", or "Scrapy commands".
here as the "Scrapy tool" to differentiate it from the sub-commands, which we
just call "commands" or "Scrapy commands".

The Scrapy tool provides several commands, for multiple purposes, and each one
accepts a different set of arguments and options.
Expand Down Expand Up @@ -214,7 +214,7 @@ crawl
* Syntax: ``scrapy crawl <spider>``
* Requires project: *yes*

Start crawling a spider.
Start crawling using a spider.

Usage examples::

Expand Down Expand Up @@ -297,13 +297,13 @@ Downloads the given URL using the Scrapy downloader and writes the contents to
standard output.

The interesting thing about this command is that it fetches the page how the
the spider would download it. For example, if the spider has an ``USER_AGENT``
spider would download it. For example, if the spider has an ``USER_AGENT``
attribute which overrides the User Agent, it will use that one.

So this command can be used to "see" how your spider would fetch certain page.
So this command can be used to "see" how your spider would fetch a certain page.

If used outside a project, no particular per-spider behaviour would be applied
and it will just use the default Scrapy downloder settings.
and it will just use the default Scrapy downloader settings.

Usage examples::

Expand Down Expand Up @@ -346,7 +346,7 @@ shell
* Syntax: ``scrapy shell [url]``
* Requires project: *no*

Starts the Scrapy shell for the given URL (if given) or empty if not URL is
Starts the Scrapy shell for the given URL (if given) or empty if no URL is
given. See :ref:`topics-shell` for more info.

Usage example::
Expand All @@ -362,7 +362,7 @@ parse
* Syntax: ``scrapy parse <url> [options]``
* Requires project: *yes*

Fetches the given URL and parses with the spider that handles it, using the
Fetches the given URL and parses it with the spider that handles it, using the
method passed with the ``--callback`` option, or ``parse`` if not given.

Supported options:
Expand All @@ -371,7 +371,7 @@ Supported options:
response

* ``--rules`` or ``-r``: use :class:`~scrapy.contrib.spiders.CrawlSpider`
rules to discover the callback (ie. spider method) to use for parsing the
rules to discover the callback (i.e. spider method) to use for parsing the
response

* ``--noitems``: don't show scraped items
Expand Down Expand Up @@ -467,7 +467,7 @@ bench
* Syntax: ``scrapy bench``
* Requires project: *no*

Run quick benchmark test. :ref:`benchmarking`.
Run a quick benchmark test. :ref:`benchmarking`.

Custom project commands
=======================
Expand All @@ -484,7 +484,7 @@ COMMANDS_MODULE

Default: ``''`` (empty string)

A module to use for looking custom Scrapy commands. This is used to add custom
A module to use for looking up custom Scrapy commands. This is used to add custom
commands for your Scrapy project.

Example::
Expand Down
8 changes: 4 additions & 4 deletions docs/topics/items.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,11 @@ Item Fields

:class:`Field` objects are used to specify metadata for each field. For
example, the serializer function for the ``last_updated`` field illustrated in
the example above.
the example above.

You can specify any kind of metadata for each field. There is no restriction on
the values accepted by :class:`Field` objects. For this same
reason, there isn't a reference list of all available metadata keys. Each key
reason, there is no reference list of all available metadata keys. Each key
defined in :class:`Field` objects could be used by a different components, and
only those components know about it. You can also define and use any other
:class:`Field` key in your project too, for your own needs. The main goal of
Expand All @@ -62,9 +62,9 @@ documentation to see which metadata keys are used by each component.

It's important to note that the :class:`Field` objects used to declare the item
do not stay assigned as class attributes. Instead, they can be accessed through
the :attr:`Item.fields` attribute.
the :attr:`Item.fields` attribute.

And that's all you need to know about declaring items.
And that's all you need to know about declaring items.

Working with Items
==================
Expand Down
14 changes: 7 additions & 7 deletions docs/topics/loaders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ value and extracts a length from it::
return parsed_length

By accepting a ``loader_context`` argument the function is explicitly telling
the Item Loader that is able to receive an Item Loader context, so the Item
the Item Loader that it's able to receive an Item Loader context, so the Item
Loader passes the currently active context when calling it, and the processor
function (``parse_length`` in this case) can thus use them.

Expand All @@ -245,7 +245,7 @@ There are several ways to modify Item Loader context values:
loader = ItemLoader(product, unit='cm')

3. On Item Loader declaration, for those input/output processors that support
instatiating them with a Item Loader context. :class:`~processor.MapCompose` is one of
instantiating them with an Item Loader context. :class:`~processor.MapCompose` is one of
them::

class ProductLoader(ItemLoader):
Expand Down Expand Up @@ -486,7 +486,7 @@ Reusing and extending Item Loaders
==================================

As your project grows bigger and acquires more and more spiders, maintenance
becomes a fundamental problem, specially when you have to deal with many
becomes a fundamental problem, especially when you have to deal with many
different parsing rules for each spider, having a lot of exceptions, but also
wanting to reuse the common processors.

Expand All @@ -497,7 +497,7 @@ support traditional Python class inheritance for dealing with differences of
specific spiders (or groups of spiders).

Suppose, for example, that some particular site encloses their product names in
three dashes (ie. ``---Plasma TV---``) and you don't want to end up scraping
three dashes (e.g. ``---Plasma TV---``) and you don't want to end up scraping
those dashes in the final product names.

Here's how you can remove those dashes by reusing and extending the default
Expand Down Expand Up @@ -567,7 +567,7 @@ Here is a list of all built-in processors:

.. class:: TakeFirst

Return the first non-null/non-empty value from the values received,
Returns the first non-null/non-empty value from the values received,
so it's typically used as an output processor to single-valued fields.
It doesn't receive any constructor arguments, nor accept Loader contexts.

Expand Down Expand Up @@ -604,8 +604,8 @@ Here is a list of all built-in processors:
function, and so on, until the last function returns the output value of
this processor.

By default, stop process on None value. This behaviour can be changed by
passing keyword argument stop_on_none=False.
By default, stop process on ``None`` value. This behaviour can be changed by
passing keyword argument ``stop_on_none=False``.

Example::

Expand Down
Loading

0 comments on commit 210a0a6

Please sign in to comment.