forked from bruno/openaustralia-parser
-
Notifications
You must be signed in to change notification settings - Fork 0
/
INSTALL.txt
239 lines (171 loc) · 9.71 KB
/
INSTALL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
= How to Install =
== Assumptions ==
This document goes over how we have set up !OpenAustralia on our development machines. For a server set up, you can use this as a base however it might require a different approach unless you have complete control (i.e. you are root) on the server and can install all the dependencies, secure the machine, etc.
Configuring Apache, PHP, MySQL or any other application for optimal performance is beyond the scope of this document. You should be able to find enough information online to help you along the way.
These steps have only been tested on Mac OS X 10.5 (leopard). They might work as well on other Unix derivatives.
== Requirements ==
* Unix
* Apache + PHP + MySQL (we've tested with Apache 2.X.X, PHP5, MySQL 5.0.x)
* Ruby (we've used the included version in Leopard)
* the following rubygems
* mechanize
* builder
* RMagick
(this has dependencies like !GraphicsMagick/!ImageMagick, which in turn needs
ghostscript)
* rcov
* [http://git.or.cz/ git]
== How to install all the dependencies ==
=== Max OS X Leopard ===
Apache, PHP and Ruby all come with Leopard. If you need to install any of these
on Mac OS X (if for whatever reason you don't have them installed) there's a ton
of information online:
1. "Entropy's Instructions":http://www.entropy.ch/software/macosx/
1. "Hivelogic's Instructions":http://hivelogic.com/articles/ruby_rails_lighttpd_mysql_tiger/
You should also be able to get MySQL from "MySQL's website":http://www.mysql.com/ as they now distribute binary versions for Mac OS X (at the time of writing this document, you can find the 5.0.51a MySQL Community Server at "MySQL Community Server":http://dev.mysql.com/downloads/mysql/5.0.html#macosx-dmg).
Install "!DarwinPorts":http://darwinports.com/ and then install git, !ImageMagick and ghostscript:
{{{
$ sudo port install git-core
$ sudo port install ImageMagick
$ sudo port install ghostscript
}}}
Note: the previous step takes a long while to complete, make yourself a coffee (or two)
As the parsing of XML files to insert into the database is done with Perl (and there's quite a few scripts in Perl), you will need a few Perl CPAN modules:
{{{
$ sudo perl -MCPAN -e shell
cpan> install Error
cpan> install XML::Twig
cpan> install DBD::mysql
cpan> install XML::RSS
}}}
=== Ubuntu 8.04 ===
Use apt-get to install the requirements:
{{{
$ sudo apt-get install apache2 php5 php5-cli mysql-server libmysqlclient15-dev git-core imagemagick libmagick9-dev ghostscript ruby rubygems ruby1.8-dev
}}}
Install the required rubygems:
{{{
$ sudo gem install builder
$ sudo gem install rcov
$ sudo gem install mechanize -v 0.6.10
$ sudo gem install RMagick
$ sudo gem install log4r
}}}
Note: Currently !OpenAustralia requires an older version of mechanize (0.6.10), but this might change in the future.
As the parsing of XML files to insert into the database is done with Perl (and there's quite a few scripts in Perl), you will need a few Perl CPAN modules:
{{{
$ sudo perl -MCPAN -e shell
cpan> install Error
cpan> install XML::Twig
cpan> install DBD::mysql
cpan> install XML::RSS
}}}
=== For Windows ===
Apache, PHP and MySQL can all be installed together with the [http://www.apachefriends.org/en/xampp-windows.html Xampp for Windows] package. Perl can be downloaded from !ActiveState in the [http://www.activestate.com/Products/activeperl/index.mhtml ActivePerl] package. Ruby has its own Windows versions that you need to get from [http://www.ruby-lang.org/en/downloads/ Ruby Downloads] (choose the one-click installer option).
==== Perl ====
The x86 version of !ActivePerl comes with a GUI for installing packages that makes the whole process a lot easier. Refer to http://aspn.activestate.com/ASPN/docs/ActivePerl/5.10/faq/ActivePerl-faq2.html for instructions on running it. !ActivePerl comes with a lot of packages already installed but there are a few you'll need to install yourself, namely: XML-Twig and DBD-mysql.
If DBD-mysql fails to install try installing it manually with
{{{
ppm install http://cpan.uwinnipeg.ca/PPMPackages/10xx/DBD-mysql.ppd
}}}
Reference http://dev.mysql.com/doc/refman/5.0/en/activestate-perl.html (bottom of page).
==== Ruby ====
In addition to the Ruby gems required above you'll need to install Ruby-MySQL, which can be downloaded from http://www.tmtm.org/en/ruby/mysql/.
== Installing !OpenAustralia ==
=== Web Application ===
For development purposes we have our web application and the parser under {{{/Library/WebServer/Documents/}}} and, unless you want to patch the configuration too much, we recommend that you install it there (also, if you do it this way, you have the application available under the root of your webserver on your Mac).
Note for Ubuntu/Linux users: rather than {{{/Library/WebServer/Documents/}}}, the web application should go under {{{/var/www/}}}. The rest of this document will refer to {{{/Library/WebServer/Documents/}}} however, so substitute as necessary.
{{{
$ cd /Library/WebServer/
$ sudo chown -R $USER:staff Documents
}}}
And enter your admin password. This is necessary so that you don't have to always use sudo when editing files or coding on the website.
{{{
$ cd Documents
$ git clone git://github.com/mlandauer/openaustralia.git
$ cd openaustralia
$ git submodule init
$ git submodule update
}}}
You should now have the website files located at {{{/Library/WebServer/Documents/openaustralia}}}. Also, you should now have the parser
installed under {{{/Library/WebServer/Documents/openaustralia/openaustralia-parser}}}.
=== Configuration of the Parser ===
The only configuration necessary is to change the web-root if you have installed the web application in another location. That value is {{{web_root}}} in the {{{configuration.yml}}} file at the root of the {{{openaustralia-parser}}}.
=== Configuration of the Web Application ===
We now need to configure the web application, which includes creating a DB in MySQL and loading the schema. We assume that you have MySQL running and that your MySQL super user is {{{root}}} and the account has a password.
Remember to edit your Apache httpd.conf file to include the httpd.conf file in \openaustralia\twfy\conf\.
We need to note again that these instructions are just for developers wanting to run the application on their machines and not recommendations or best-practices in performance and security.
==== MySQL ====
We need to create the database. This is pretty simple:
{{{
$ mysqladmin -u root -p create openaustralia
Enter password: ******
}}}
You are now ready to import the schema
{{{
$ mysql -u root -p openaustralia < /Library/WebServer/Documents/openaustralia/twfy/db/schema.sql
Enter password: ******
}}}
==== Configuration ====
There is a file that you need to edit (and remember NOT to commit your changes on that file) on the web application:
{{{/Library/WebServer/Documents/openaustralia/twfy/conf/general}}}
It's well documented and quite explanatory. It contains the configuration for MySQL (database name, host, username, etc) as well as the URL and paths for the web application on your machine.
Just for initial testing you probably don't want to install Xapian, the search engine, so if that's the case make sure that you set
{{{
define ("XAPIANDB", '');
}}}
== Running the Parser ==
Before you can run the parser, you will need to create the directories that will hold the images of the MPs.
{{{
$ mkdir -p pwdata/images/mps pwdata/images/mpsL
}}}
You are now ready to create the members information. You should just use:
{{{
$ ./parse-members.rb
# you should see messages on the console similar to the following
Reading members data...
Running consistency checks...
Writing XML...
Replacing existing member with new data for 5
This is for your information only, just check it looks OK.
$VAR1 = [
'5',
'10006',
1,
'',
'Albert',
'Adermann',
'Fisher',
'National Party',
'1972-12-02',
'1984-12-01',
'general_election',
'elected_elsewhere'
];
[...]
}}}
To download the members images:
{{{
$ ./member-images.rb
}}}
If you want, though it is not particularly important initially, you can also download the links information (which goes on the Representative's pages) by running:
{{{
$ ./parse-member-links.rb
}}}
You should now parse the speeches and you would have a full database.
To download the Hansard data (the speeches) for one day, say Sept 20th, 2007:
{{{
$ ./parse-speeches.rb 2007.09.20
INFO HansardParser: Parsing speeches for Thu 20 Sep 2007...
WARN HansardParser: Not yet supporting: Procedural text: CROSS-BORDER INSOLVENCY BILL 2007: First Reading
WARN HansardParser: Not yet supporting: Procedural text: TRADEX SCHEME AMENDMENT BILL 2007: First Reading
WARN HansardParser: Not yet supporting: Procedural text: FAMILIES, COMMUNITY SERVICES AND INDIGENOUS AFFAIRS AND OTHER LEGISLATION AMENDMENT (EMERGENCY RESPONSE CONSOLIDATION) BILL 2007: First Reading
WARN HansardParser: Not yet supporting: Procedural text: TAX LAWS AMENDMENT (TAXATION OF FINANCIAL ARRANGEMENTS) BILL 2007: First Reading
WARN HansardParser: Not yet supporting: Procedural text: VETERANS' ENTITLEMENTS AMENDMENT (DISABILITY, WAR WIDOW AND WAR WIDOWER PENSIONS) BILL 2007: First Reading
WARN HansardParser: Not yet supporting: Procedural text: COMMITTEES: Legal and Constitutional Affairs Committee: Report > 09:46:00
WARN HansardParser: Not yet supporting: Procedural text: COMMITTEES: Legal and Constitutional Affairs Committee: Report: Referral to Main Committee
WARN HansardParser: Not yet supporting: Procedural text: NATIONAL HEALTH AMENDMENT (PHARMACEUTICAL BENEFITS) BILL 2007: Referred to Main Committee
[...]
db loading debates 2007-09-20
}}}
You should now be able to view the results at "Your Webserver URL":http://localhost/openaustralia/twfy/www/docs/