5 releases
| 0.2.3 | Apr 22, 2023 |
|---|---|
| 0.2.2 | Nov 3, 2019 |
| 0.2.1 | Jul 9, 2017 |
| 0.2.0 | Jul 8, 2017 |
| 0.1.0 | Jul 7, 2017 |
#6 in #html-table
330 downloads per month
Used in 2 crates
23KB
441 lines
Utility for extracting data from HTML tables.
This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:
Table::find_firstfinds the first table.Table::find_by_idfinds a table by its HTML id.Table::find_by_headersfinds a table that has certain headers.
Each of these returns an Option<Table>, since there might not be any
matching table in the HTML. Once you have a table, you can iterate over it
and access the contents of each Row.
Examples
Here is a simple example that uses Table::find_first to print the cells
in each row of a table:
let html = r#"
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
println!(
"{} is {} years old",
row.get("Name").unwrap_or("<name missing>"),
row.get("Age").unwrap_or("<age missing>")
)
}
If the document has multiple tables, we can use Table::find_by_headers
to identify the one we want:
let html = r#"
<table></table>
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>John</td><td>20</td></tr>
</table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
for cell in row {
println!("Table cell: {}", cell);
}
}
TableExtract
TableExtract is a Rust library for extracting data from HTML tables. It is inspired by Perl's HTML::TableExtract.
Check out the crate documentation for more information.
Usage
TableExtract is on crates.io. To use it, just add this to your Cargo.toml:
[dependencies]
table-extract = "0.2"
Contributing
Contributions are welcome! There are two things to keep in mind:
- This project uses the stable Rust toolchain from rustup.
- This project uses
cargo fmtto keep the code tidy.
License
© 2019 Mitchell Kember
TableExtract is available under the MIT License; see LICENSE for details.
Dependencies
~5.5MB
~105K SLoC