System Programming with Rust Guide
System Programming with Rust Guide
Programming for
Rust Developers
Prabhu Eshwarla
BIRMINGHAM—MUMBAI
Practical System Programming for Rust
Developers
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and
distributors, will be held liable for any damages caused or alleged to have been caused directly
or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies
and products mentioned in this book by the appropriate use of capitals. However, Packt
Publishing cannot guarantee the accuracy of this information.
ISBN 978-1-80056-096-3
[Link]
First and foremost, I'd like to thank my spiritual master, Sri Ganapathy
Sachchidananda Swamiji, to whom I owe everything. He has instilled clear
values in me and shown me the right attitude and purpose in life.
I wish to thank my parents for their unconditional love, the right guidance,
and for standing by me at all times.
My source of strength comes from my loving wife and children—Parimala,
Adithya, and Deekshita—without whose constant encouragement and
support I would not have had the courage to write this book and persevere
to complete it.
[Link]
Subscribe to our online digital library for full access to over 7,000 books and videos, as
well as industry leading tools to help you plan your personal development and advance
your career. For more information, please visit our website.
Why subscribe?
• Spend less time learning and more time coding with practical eBooks and Videos
from over 4,000 industry professionals
• Improve your learning with Skill Plans built especially for you
• Get a free eBook or video every month
• Fully searchable for easy access to vital information
• Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at [Link] and as a print
book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
customercare@[Link] for more details.
At [Link], you can also read a collection of free technical articles, sign up
for a range of free newsletters, and receive exclusive discounts and offers on Packt books
and eBooks.
Contributors
About the author
Prabhu Eshwarla has been shipping high-quality, business-critical software to large
enterprises and running IT operations for over 25 years. He is also a passionate teacher
of complex technologies.
Prabhu has worked with Hewlett Packard and has deep experience in software
engineering, engineering management, and IT operations.
Prabhu is passionate about Rust and blockchain and specializes in distributed systems.
He considers coding to be a creative craft, and an excellent tool to create new digital
worlds (and experiences) sustained through rigorous software engineering.
I'd like to thank the technical reviewer, Roman Krasiuk, whose meticulous
reviews of the manuscript and code examples, and good understanding of
the subject domain, have enhanced the technical quality of the book.
Last but not least, my sincere thanks to Packt for publishing this book, and
specifically to the following people, without whom this book would not have
been possible: Karan Gupta, for convincing me to write the book; Richa
Tripathi, for providing market insights; Nitee Shetty, Ruvika Rao, and
Prajakta Naik, for tirelessly working with me to improve the drafts; Francy
Puthiry, for keeping me on schedule; Gaurav Gala, for the verification
and packaging of code; and the others on the Packt team involved in copy
editing, proofreading, publishing, and marketing the book.
About the reviewer
Roman Krasiuk is an R&D software engineer who has worked on industry-leading
products in trading, blockchain, and energy markets. Having started his professional career
at the age of 18, he loves to dispel the myth that young people cannot occupy lead roles.
His areas of expertise include large-scale infrastructure development, the automation of
financial services, and big data engineering. Roman is a believer that coding is a form
of art and his biggest desire is to create a masterpiece that will show people just how
gorgeous code can be.
I would like to thank Daniel Durante for cultivating my love for Rust,
Jonas Frost for teaching me code discipline, and Alex Do for showing that
one person can have limitless knowledge. Special thanks to Alex Steiner for
opening the door to the land of opportunities and unveiling the power
of hard work.
2
A Tour of the Rust Programming Language
Technical requirements 30 Parser methods 48
Analyzing the problem domain 30 Operator precedence 53
3
Introduction to the Rust Standard Library
Technical requirements 68 Building a template engine 83
The Rust Standard Library and Template syntax and design 85
systems programming 68 Writing the template engine 94
Exploring the Rust Standard Executing the template engine 103
Library 71 Summary 104
Computation-oriented modules 75
Further reading 104
Syscalls-oriented modules 78
4
Managing Environment, Command Line, and Time
Technical requirements 107 Developing the command-line
Project scope and design application and testing 128
overview 107 Designing the command-line interface 129
What will we build? 107 Coding the command-line binary using
Technical design 111 structopt 131
Using the Rust Standard Library 113 Summary 135
Coding the imagix library 120
Table of Contents iii
6
Working with Files and Directories in Rust
Technical requirements 174 Writing a shell command in
Understanding Linux system Rust (project) 188
calls for file operations 174 Code overview 189
Doing file I/O in Rust 177 Error handling 191
Source metric computation 193
Learning directory and path
The main() function 198
operations 183
Setting hard links, symbolic Summary 200
links, and performing queries 187
7
Implementing Terminal I/O in Rust
Technical requirements 204 Introducing terminal I/O
fundamentals 204
iv Table of Contents
8
Working with Processes and Signals
Technical requirements 228 Handling I/O and environment
Understanding Linux process variables 239
concepts and syscalls 229 Handling the I/O of child processes 240
How does a program become a Setting the environment for the
process? 229 child process 242
Delving into Linux process
Handling panic, errors,
fundamentals 231
and signals 243
Spawning processes with Rust 235 Aborting the current process 244
Spawning new child processes 235 Signal handling 245
Terminating processes 237
Writing a shell program in
Checking the status of a child process'
Rust (project) 248
execution 239
Summary 255
9
Managing Concurrency
Technical requirements 258 Spawning and configuring
Reviewing concurrency basics 258 threads 264
Concurrency versus parallelism 259 Error handling in threads 267
Concepts of multi-threading 261 Message passing between
threads 269
Table of Contents v
11
Learning Network Programming
Technical requirements 310 Writing a TCP server and client 320
Reviewing networking basics Writing a TCP reverse proxy
in Linux 310 (project) 322
Understanding networking Writing the origin server – structs and
primitives in the Rust methods 324
standard library 314 Writing the origin server – the main()
Programming with TCP and function 326
UDP in Rust 317 Writing the reverse proxy server 331
Writing a UDP server and client 317
Summary 336
vi Table of Contents
12
Writing Unsafe Rust and FFI
Technical requirements 338 Introducing FFIs 343
Introducing unsafe Rust 339 Reviewing guidelines for
How do you distinguish between safe safe FFIs 346
and unsafe Rust code? 339 Calling Rust from C (project) 347
Operations in unsafe Rust 340 Understanding the ABI 351
Summary 354
Other Books You May Enjoy
Index
Preface
The modern software stack is evolving rapidly in size and complexity. Technology
domains such as the cloud, the web, data science, machine learning, DevOps, containers,
IoT, embedded systems, distributed ledgers, virtual and augmented reality, and artificial
intelligence continue to evolve and specialize. This has resulted in a severe shortage of
system software developers able to build out the system infrastructure components.
Modern societies, businesses, and governments increasingly rely heavily on digital
technologies, which puts greater emphasis on developing safe, reliable, and efficient systems
software and software infrastructure that modern web and mobile applications are built on.
System programming languages such as C/C++ have proved their mettle for decades in
this domain, and provide a high degree of control and performance, but it is at the cost
of memory safety.
Higher-level languages such as Java, C#, Python, Ruby, and JavaScript provide memory
safety but offer less control over memory layout, and suffer from garbage collection pauses.
Rust is a modern, open source system programming language that promises the best of
three worlds: the type safety of Java; the speed, expressiveness, and efficiency of C++; and
memory safety without a garbage collector.
This book adopts a unique three-step approach to teaching system programming in
Rust. Each chapter in this book starts with an overview of the system programming
fundamentals and kernel system calls for that topic in Unix-like operating systems (Unix/
Linux/macOS). You will then learn how to perform common system calls using the Rust
Standard Library, and in a few cases, external crates, using abundant code snippets. This
knowledge is then reinforced through a practical example project that you will build.
Lastly, there are questions in each chapter to embed learning.
By the end of this book, you will have a sound foundational understanding of how to use
Rust to manage and control operating system resources such as memory, files, processes,
threads, system environment, peripheral devices, networking interfaces, terminals, and
shells, and you'll understand how to build cross-language bindings through FFI. Along
the way, you will learn how to use the tools of the trade, and get a firm appreciation of the
value Rust brings to build safe, performant, reliable, and efficient system-level software.
viii Preface
rustc --version
cargo –version
There are two types of code in each chapter which are placed in the Packt GitHub
repository for the book:
• The code corresponding to the example projects (which are referred to by named
source files within the chapter),
• Independent code snippets, that are placed within the miscellaneous folder
within each chapter (where applicable)
If you are using the digital version of this book, we advise you to type the code yourself
or access the code via the GitHub repository (link available in the next section). Doing so
will help you avoid any potential errors related to the copying and pasting of code.
While using cargo run command to build and run Rust programs, you may encounter
'permission denied' messages if the user ID with which the command is run does not have
sufficient permissions to perform system-level operations (such as reading or writing to
files). In such cases, one of the workarounds that can be used is to run the program with
the following command:
Once the file is downloaded, please make sure that you unzip or extract the folder using
the latest version of:
The code bundle for the book is also hosted on GitHub at [Link]
PacktPublishing/Practical-System-Programming-for-Rust-Developers.
In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at
[Link] Check them out!
Note
The code snippets in this book are designed for learning, and not intended
to be of production quality. As a result, while the code examples are practical
and use idiomatic Rust, they are not likely to be full-featured with robust error
handling covering all types of edge cases. This is by design, so as not to impede
the learning process.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles.
Here is an example: "We can access the now() function from the Utc module to print out
the current date and time."
A block of code is set as follows:
fn main() {
println!("Hello, time now is {:?}", chrono::Utc::now());
}
When we wish to draw your attention to a particular part of a code block, the relevant
lines or items are set in bold:
fn main() {
println!("Hello, time now is {:?}", chrono::Utc::now());
}
xii Preface
Bold: Indicates a new term, an important word, or words that you see onscreen. For
example, words in menus or dialog boxes appear in the text like this. Here is an example:
"You will see Hello, world! printed to your console."
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book
title in the subject of your message and email us at customercare@[Link].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you have found a mistake in this book, we would be grateful if you would
report this to us. Please visit [Link]/support/errata, selecting your
book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet,
we would be grateful if you would provide us with the location address or website name.
Please contact us at copyright@[Link] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise
in and you are interested in either writing or contributing to a book, please visit
[Link].
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on
the site that you purchased it from? Potential readers can then see and use your unbiased
opinion to make purchase decisions, we at Packt can understand what you think about
our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit [Link].
Section 1:
Getting Started with
System Programming
in Rust
This section covers the foundational concepts behind system programming in Rust. It
includes a tour of Rust's features, Cargo tools, the Rust Standard Library, modules for
managing environment variables, command-line parameters, and working with time.
Example projects include a parser to evaluate arithmetic expressions, writing a feature of
an HTML template engine, and building a command-line tool for image processing.
This section comprises the following chapters:
• Writing test scripts and doing automated unit and integration testing
• Automating the generation of technical documentation
By the end of this chapter, you will have learned how to select the right project type
and toolchain; organize project code efficiently; add external and internal libraries as
dependencies; build the project for development, test, and production environments;
automate testing; and generate documentation for your Rust code.
Technical requirements
Rustup must be installed in the local development environment. Use this link for
installation: [Link]
Refer to the following link for official installation instructions:
[Link]
After installation, check rustc, and cargo have been installed correctly with the
following commands:
rustc --version
cargo --version
Rust also allows you to build different types of binaries – standalone executables, static
libraries, and dynamic libraries. If you know upfront what you will be building, you can
create the right project type with the scaffolding code generated for you.
We will cover these in this section.
Note
Rust's stable version is released every 6 weeks; for example, Rust 1.42.0 was
released on March 12, 2020, and 6 weeks later to the day, Rust 1.43 was released
on April 23, 2020.
A new nightly version of Rust is released every day. Once every 6 weeks, the
latest master branch of nightly becomes the beta version.
Most Rust developers primarily use the stable channel. Beta channel releases are not used
actively, but only to test for any regressions in the Rust language releases.
6 Tools of the Trade – Rust Toolchains and Project Structures
The nightly channel is for active language development and is published every night. The
nightly channel lets Rust develop new and experimental features and allows early adopters
to test them before they are stabilized. The price to be paid for early access is that there
may be breaking changes to these features before they get into stable releases. Rust uses
feature flags to determine what features are enabled in a given nightly release. A user who
wants to use a cutting-edge feature in nightly version has to annotate the code with the
appropriate feature flag.
An example of a feature flag is shown here:
#![feature(try_trait)]
Note that beta and stable releases cannot use feature flags.
Rustup is configured to use the stable channel by default. To work with other channels,
here are a few commands. For a complete list, refer to the official link:
[Link]
To install nightly Rust, use this command:
To get the version of the compiler in nightly Rust, use this command:
To show the installed toolchains and which is currently active, use this command:
rustup show
To update the installed toolchains to the latest versions, use this command:
rustup update
Introducing Cargo and project structures 7
Note that once rustup default <channel-name> is set, other related tools, such as
Cargo and Rustc, use the default channel set.
Which Rust channel should you use for your project? For any production-bound
projects, it is advisable to use only the stable release channel. For any experimental
projects, the nightly or beta channels may be used, with caution as there may be breaking
changes needed for the code in future releases.
Cargo is the tool that can be used to set up the basic project scaffolding structure for
a new Rust project. Before we create a new Rust project with Cargo, let's first understand
the options for organizing code within Rust projects:
Multiple modules can be organized into crates. Crates also serve as the unit of code
sharing across Rust projects. A crate is either a library or a binary. A crate developed by
one developer and published to a public repository can be reused by another developer or
team. The crate root is the source file that the Rust compiler starts from. For binary crates,
the crate root is [Link] and for library crates it is [Link].
One or more crates can be combined into a package. A package contains a [Link]
file, which contains information on how to build the package, including downloading and
linking the dependent crates. When Cargo is used to create a new Rust project, it creates
a package. A package must contain at least one crate – either a library or a binary crate.
A package may contain any number of binary crates, but it can contain either zero or only
one library crate.
As Rust projects grow in size, there may be a need to split up a package into multiple
units and manage them independently. A set of related packages can be organized as
a workspace. A workspace is a set of packages that share the same [Link] file
(containing details of specific versions of dependencies that are shared across all packages
in the workspace) and output directory.
Let's see a few examples to understand various types of project structures in Rust.
1. The first step is to generate a Rust source package using the cargo new command.
2. Run the following command in a terminal session inside your working directory to
create a new package:
cargo new --bin first-program && cd first-program
10 Tools of the Trade – Rust Toolchains and Project Structures
The --bin flag is to tell Cargo to generate a package that, when compiled, would
produce a binary crate (executable).
first-program is the name of the package given. You can specify a name of
your choice.
3. Once the command executes, you will see the following directory structure:
4. To generate a binary crate (or executable) from this package, run the following
command:
cargo build
This command creates a folder called target in the project root and creates
a binary crate (executable) with the same name as the package name (first-
program, in our case) in the location target/debug.
5. Execute the following from the command line:
cargo run
By default, the name of the binary crate (executable) generated is the same as the
name of the source package. If you wish to change the name of the binary crate,
add the following lines to [Link]:
[[bin]]
name = "new-first-program"
path = "src/[Link]"
You will see a new executable with the name new-first-program in the
target/debug folder. You will see Hello, world! printed to your console.
7. A cargo package can contain the source for multiple binaries. Let's learn how to add
another binary to our project. In [Link], add a new [[bin]] target below
the first one:
[[bin]]
name = "new-first-program"
path = "src/[Link]"
[[bin]]
name = "new-second-program"
path = "src/[Link]"
12 Tools of the Trade – Rust Toolchains and Project Structures
8. Next, create a new file, src/[Link], and add the following code:
fn main() {
println!("Hello, for the second time!");
}
You will see the statement Hello, for the second time! printed to your console. You'll
also find a new executable created in the target/debug directory with the name
new-second-program.
Congratulations! You have learned how to do the following:
• Create your first Rust source package and compile it into an executable binary crate
• Give a new name to the binary, different from the package name
• Add a second binary to the same cargo package
Note that a cargo package can contain one or more binary crates.
Configuring Cargo
A cargo package has an associated [Link] file, which is also called the manifest.
The manifest, at a minimum, contains the [package] section but can contain many
other sections. A subset of the sections are listed here:
Specifying output targets for the package: Cargo packages can have five types of targets:
• [[bin]]: A binary target is an executable program that can be run after it is built.
• [lib]: A library target produces a library that can be used by other libraries
and executables.
• [[example]]: This target is useful for libraries to demonstrate the use of external
APIs to users through example code. The example source code located in the
example directory can be built into executable binaries using this target.
• [[test]]: Files located in the tests directory represent integration tests and
each of these can be compiled into a separate executable binary.
• [[bench]]: Benchmark functions defined in libraries and binaries are compiled
into separate executables.
Automating build management with Cargo 13
For each of these targets, the configuration can be specified, including parameters such
as the name of the target, the source file of the target, and whether you want cargo to
automatically run test scripts and generate documentation for the target. You may recall
that in the previous section, we changed the name and set the source file for the generated
binary executable.
Specifying dependencies for the package: The source files in a package may depend on
other internal or external libraries, which are also called dependencies. Each of these in
turn may depend on other libraries and so on. Cargo downloads the list of dependencies
specified under this section and links them to the final output targets. The various types of
dependencies include the following:
Specifying build profiles: There are four types of profiles that can be specified while
building a cargo package:
• dev: The cargo build command uses the dev profile by default. Packages built
with this option are optimized for compile-time speed.
• release: The cargo build –-release command enables the release profile,
which is suitable for production release, and is optimized for runtime speed.
• test: The cargo test command uses this profile. This is used to build
test executables.
• bench: The cargo bench command creates the benchmark executable, which
automatically runs all functions annotated with the #[bench] attribute.
├── [Link]
├── src
│ └── [Link]
cargo build
You will see the library built under target/debug and it will have the name
libmy_first_lib.rlib.
To invoke the function in this library, let's build a small binary crate. Create a bin
directory under src, and a new file, src/bin/[Link].
Add the following code:
use my_first_lib::hello_from_lib;
fn main() {
println!("Going to call library function");
hello_from_lib("Rust system programmer");
}
You will see the print statement in your console. Also, the binary mymain will be
placed in the target/debug folder along with the library we wrote earlier. The binary
crate looks for the library in the same folder, which it finds in this case. Hence it is able to
invoke the function within the library.
If you want to place the [Link] file in another location (instead of within src/
bin), then add a target in [Link] and mention the name and path of the binary as
shown in the following example, and move the [Link] file to the specified location:
[[bin]]
name = "mymain"
path = "src/[Link]"
Run cargo run --bin mymain and you will see the println output in your console.
[dependencies]
chrono = "0.4.0"
16 Tools of the Trade – Rust Toolchains and Project Structures
• [Link] registry: This is the default option and all that is needed is to specify the
package name and version string as we did earlier in this section.
• Alternative registry: While [Link] is the default registry, Cargo provides
the option to use an alternate registry. The registry name has to be configured in
the .cargo/config file, and in [Link], an entry is to be made with the
registry name, as shown in the example here:
[dependencies]
cratename = { version = "2.1", registry = "alternate-
registry-name" }
• Git repository: A Git repo can be specified as the dependency. Here is how to do it:
[dependencies]
chrono = { git = "[Link] ,
branch = "master" }
Cargo will get the repo at the branch and location specified, and look for its
[Link] file in order to fetch its dependencies.
• Specify a local path: Cargo supports path dependencies, which means the library
can be a sub-crate within the main cargo package. While building the main cargo
package, the sub-crates that have also been specified as dependencies will be built.
But dependencies with only a path dependency cannot be uploaded to the [Link]
public registry.
Automating dependency management 17
• Multiple locations: Cargo supports the option to specify both a registry version and
either a Git or path location. For local builds, the Git or path version is used, and
the registry version will be used when the package is published to [Link].
use chrono::Utc;
fn main() {
println!("Hello, time now is {:?}", Utc::now());
}
The use statement tells the compiler to bring the chrono package Utc module into the
scope of this program. We can then access the function now() from the Utc module to
print out the current date and time. The use statement is not mandatory. An alternative
way to print datetime would be as follows:
fn main() {
println!("Hello, time now is {:?}", chrono::Utc::now());
}
This would give the same result. But if you have to use functions from the chrono
package multiple times in code, it is more convenient to bring chrono and required
modules into scope once using the use statement, and it becomes easier to type.
It is also possible to rename the imported package with the as keyword:
For more details on managing dependencies, refer to the Cargo docs: [Link]
[Link]/cargo/reference/[Link].
In this section, we have seen how to add dependencies to a package. Any number of
dependencies can be added to [Link] and used within the program. Cargo makes
the dependency management process quite a pleasant experience.
Let's now look at another useful feature of Cargo – running automated tests.
18 Tools of the Trade – Rust Toolchains and Project Structures
Write a new function that returns the process ID of the currently running process. We
will look at the details of process handling in a later chapter, so you may just type in the
following code, as the focus here is on writing unit tests:
use std::process;
fn main() {
println!("{}", get_process_id());
}
fn get_process_id() -> u32 {
process::id()
}
We have written a simple (silly) function to use the standard library process module and
retrieve the process ID of the currently running process.
Run the code using cargo check to confirm there are no syntax errors.
Let's now write a unit test. Note that we cannot know upfront what the process ID is going
to be, so all we can test is whether a number is being returned:
#[test]
fn test_if_process_id_is_returned() {
assert!(get_process_id() > 0);
}
Writing and running automated tests 19
Run cargo test. You will see that the test has passed successfully, as the function
returns a non-zero positive integer.
Note that we have written the unit tests in the same source file as the rest of the code. In
order to tell the compiler that this is a test function, we use the #[test] annotation. The
assert! macro (available in standard Rust library) is used to check whether a condition
evaluates to true. There are two other macros available, assert_eq! and assert_ne!,
which are used to test whether the two arguments passed to these macros are equal or not.
A custom error message can also be specified:
#[test]
fn test_if_process_id_is_returned() {
assert_ne!(get_process_id(), 0, "There is error in code");
}
To compile but not run the tests, use the --no-run option with the cargo test
command.
The preceding example has only one simple test function, but as the number of tests
increases, the following problems arise:
• How do we write any helper functions needed for test code and differentiate it from
the rest of the package code?
• How can we prevent the compiler from compiling tests as part of each build (to
save time) and not include test code as part of the normal build (saving disk/
memory space)?
#[cfg(test)]
mod tests {
use super::get_process_id;
#[test]
fn test_if_process_id_is_returned() {
assert_ne!(get_process_id(), 0, "There is
error in code");
}
}
20 Tools of the Trade – Rust Toolchains and Project Structures
cargo test will now give the same results. But what we have achieved is greater
modularity, and we've also allowed for the conditional compilation of test code.
In src/[Link], replace the existing code with the following. This is the same code we
wrote earlier, but this time it is in [Link]:
use std::process;
pub fn get_process_id() -> u32 {
process::id()
}
Writing and running automated tests 21
use integ_test_example;
#[test]
fn test1() {
assert_ne!(integ_test_example::get_process_id(), 0, "Error
in code");
}
Note the following changes to the test code compared to unit tests:
• Integration tests are external to the library, so we have to bring the library into the
scope of the integration test. This is simulating how an external user of our library
would call a function from the public interface of our library. This is in place of
super:: prefix used in unit tests to bring the tested function into scope.
• We did not have to specify the #[cfg(test)] annotation with integration
tests, because these are stored in a separate folder and cargo compiles files in this
directory only when we run cargo test.
• We still have to specify the #[test] attribute for each test function to tell the
compiler these are the test functions (and not helper/utility code) to be executed.
Run cargo test. You will see that this integration test has been run successfully.
To verify this, let's replace the code in the integration_test1.rs file with
the following:
use integ_test_example;
#[test]
fn files_test1() {
assert_ne!(integ_test_example::get_process_id(),0,"Error
in code");
}
#[test]
fn files_test2() {
assert_eq!(1+1, 2);
}
#[test]
fn process_test1() {
assert!(true);
}
This last dummy test function is for purposes of the demonstration of running
selective cases.
Run cargo test and you can see both tests executed.
Run cargo test files_test1 and you can see files_test1 executed.
Run cargo test files_test2 and you can see files_test2 executed.
Run cargo test files and you will see both files_test1 and files_test2
tests executed, but process_test1 is not executed. This is because cargo looks for all
test cases containing the term 'files' and executes them.
In the previous example, let's say we want to exclude process_test1 from regular
execution because it is computationally intensive and takes a lot of time to execute. The
following snippet shows how it's done:
#[test]
#[ignore]
fn process_test1() {
assert!(true);
}
Run cargo test, and you will see that process_test1 is marked as ignored, and
hence not executed.
To run only the ignored tests in a separate iteration, use the following option:
The first -- is a separator between the command-line options for the cargo command
and those for the test binary. In this case, we are passing the --ignored flag for the
test binary, hence the need for this seemingly confusing syntax.
This command tells cargo to use only one thread for executing tests, which indirectly
means that tests have to be executed in sequence.
In summary, Rust's strong built-in type system and strict ownership rules enforced by
the compiler, coupled with the ability to script and execute unit and integration test cases
as an integral part of the language and tooling, makes it very appealing to write robust,
reliable systems.
24 Tools of the Trade – Rust Toolchains and Project Structures
Now that we know what to document, we have to learn how to document it. There are two
ways to document your crate:
You can use either approach, and the rustdoc tool will convert them into HTML, CSS,
and JavaScript code that can be viewed from a browser.
use std::process;
Run cargo doc –open to see the generated HTML documentation corresponding to
the documentation comments.
26 Tools of the Trade – Rust Toolchains and Project Structures
[Here is a link!]([Link]
// Function signature
// Example
```rust
use integ_test_example;
rustdoc doc/[Link]
You will find the generated HTML document [Link] in the same folder. View it in
your browser.
Documenting your project 27
use std::process;
/// This function gets the process id of the current
/// executable. It returns a non-zero number
/// ```
/// fn get_id() {
/// let x = integ_test_example::get_process_id();
/// println!("{}",x);
/// }
/// ```
pub fn get_process_id() -> u32 {
process::id()
}
If you run cargo test --doc, it will run this example code and provide the status of
the execution.
Alternatively, running cargo test will run all the test cases from the tests directory
(except those that are marked as ignored), and then run the documentation tests (that is,
code samples provided as part of the documentation).
28 Tools of the Trade – Rust Toolchains and Project Structures
Summary
Understanding the Cargo ecosystem of toolchains is very important to be effective as
a Rust programmer, and this chapter has provided the foundational knowledge that will be
used in future chapters.
We learned that there are three release channels in Rust – stable, beta, and nightly. Stable
is recommended for production use, nightly is for experimental features, and beta is an
interim stage to verify that there isn't any regression in Rust language releases before they
are marked stable. We also learned how to use rustup to configure the toolchain to use
for the project.
We saw different ways to organize code in Rust projects. We also learned how to build
executable binaries and shared libraries. We also looked at how to use Cargo to specify
and manage dependencies.
We covered how to write unit tests and integration tests for a Rust package using Rust's
built-in test framework, how to invoke automated tests using cargo, and how to control
test execution. We learned how to document packages both through inline documentation
comments and using standalone markdown files.
In the next chapter, we will take a quick tour of the Rust programming language, through
a hands-on project.
Further reading
• The Cargo Book ([Link]
• The Rust Book ([Link]
• Rust Forge ([Link]
• The Rustup book ([Link]
• The Rust style guide – the Rust style guide contains conventions, guidelines, and
best practices to write idiomatic Rust code, and can be found at the following link:
[Link]
guide/[Link]
2
A Tour of the Rust
Programming
Language
In the previous chapter, we looked at the Rust tooling ecosystem for build and dependency
management, testing, and documentation. These are critical and highly developer-friendly
tools that give us a strong foundation for starting to work on Rust projects. In this chapter,
we will build a working example that will serve to act as a refresher, and also strengthen
key Rust programming concepts.
The goal of this chapter is to get more proficient in core Rust concepts. This is essential
before diving into the specifics of systems programming in Rust. We will achieve this by
designing and developing a command-line interface (CLI) in Rust.
The application we will be building is an arithmetic expression evaluator. Since this is
a mouthful, let's see an example.
Let's assume the user enters the following arithmetic expression on the command line:
1+2*3.2+(4/2-3/2)-2.11+2^4
For the user, it appears to be a calculator, but there is a lot involved to implement this. This
example project will introduce you to the core computer science concepts used in parsers
and compiler design. It is a non-trivial project that allows us to test the depths of core Rust
programming, but is not so overly complex that it will intimidate you.
Before you continue reading, I would recommend that you clone the code repository,
navigate to the chapter2 folder, and execute the cargo run command. At the
command-line prompt, enter a few arithmetic expressions and see the results returned by
the tool. You can exit the tool with Ctrl + C. This would give you a better appreciation for
what you are going to build in this chapter.
The following are the key learning steps for this chapter, which correspond to the various
stages of building our project:
Technical requirements
You should have Rustup and Cargo installed in your local development environment.
The GitHub repository for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter02.
The tool should accept an arithmetic expression as input, evaluate it, and provide the
numerical output as a floating-point number. For example, the expression 1+2*3.2+
(4/2-3/2)-2.11+2^4 should evaluate to 21.79.
The arithmetic operations in scope are addition (+), subtraction (-), multiplication (*),
division (/), power (^), the negative prefix (-), and expressions enclosed in parentheses ().
Mathematical functions such as trigonometric and logarithmic functions, absolute, square
roots, and so on are not in scope.
With such an expression, the challenges that need to be resolved are as follows:
• The user should be able to input an arithmetic expression as free text on the
command line. Numbers, arithmetic operators, and parentheses (if any) should be
segregated and processed with different sets of rules.
• The rules of operator precedence must be taken into account (for example,
multiplication takes precedence over addition).
• Expressions enclosed within parentheses () must be given higher precedence.
• The user may not give spaces between the number and operator, but still the program
must be capable of parsing inputs with or without spaces between the characters.
• If numbers contain a decimal point, continue reading the rest of the number until an
operator or parenthesis is encountered.
• Invalid inputs should be dealt with and the program should abort with a suitable
error message. Here are some examples of invalid input:
Invalid input 1: Since we don't deal with variables in this program, if a character is
entered, the program should exit with a suitable error message (for example, 2 * a is
invalid input).
Invalid input 2: If only a single parenthesis is encountered (without a matching
closing parenthesis), the program should exit with an error message.
Invalid input 3: If the arithmetic operator is not recognized, the program should
exit with an error message.
There are clearly other types of edge cases that can cause errors. But we will focus only on
these. The reader is encouraged to implement other error conditions as a further exercise.
Now that we know the scope of what we are going to build, let's design the system.
32 A Tour of the Rust Programming Language
1. The user enters an arithmetic expression at the command-line input and presses
the Enter key.
2. The user input is scanned in its entirety and stored in a local variable.
Modeling the system behavior 33
3. The arithmetic expression (from the user) is scanned. The numbers are stored as
tokens of the Numeric type. Each arithmetic operator is stored as a token of that
appropriate type. For example, the + symbol will be represented as a token of type
Add, and the number 1 will be stored as a token of type Num with a value of 1. This
is done by the Lexer (or Tokenizer) module.
4. An Abstract Syntax Tree (AST) is constructed from the tokens in the previous
step, taking into account the sequence in which the tokens have to be evaluated. For
example, in the expression 1+2*3, the product of 2 and 3 must be evaluated before
the addition operator. Also, any sub-expressions enclosed within parentheses must
be evaluated on a higher priority. The final AST will reflect all such processing rules.
This is done by the Parser module.
5. From the constructed AST, the last step is to evaluate each node of the AST in
the right sequence, and aggregate them to arrive at the final value of the complete
expression. This is done by the Evaluator module.
6. The final computed value of the expression is displayed on the command line as
a program output to the user. Alternatively, any error in processing is displayed as
an error message.
This is the broad sequence of steps for processing. We will now take a look at translating
this design into Rust code.
We've so far seen the high-level design of the system. Let's now understand how the code
will be organized. A visual representation of the project structure is shown here:
3. Create the following files within the src/parsemath folder: [Link], token.
rs, [Link], [Link], and [Link].
4. Add the following to src/parsemath/[Link]:
pub mod ast;
pub mod parser;
pub mod token;
pub mod tokenizer;
Note that the Rust module system was used to structure this project. All functionality
related to parsing is in the parsemath folder. The [Link] file in this folder indicates
this is a Rust module. The [Link] file exports the functions in the various files contained
in this folder and makes it available to the main() function. In the main() function,
we then register the parsemath module so that the module tree is constructed by the
Rust compiler. Overall, the Rust module structure helps us organize code in different files
in a way that is flexible and maintainable.
In the following section, we will delve into how to determine the right data structures for
the tokenizer module.
• String slice
• String
We will choose the &str type, as we do not need to own the value or dynamically
increase the size of the expression. This is because the user will provide the arithmetic
expression once, and then the expression won't change for the duration of processing.
Here is one possible representation of the Tokenizer data structure:
src/parsemath/[Link]
If we took this approach, we may run into a problem. To understand the problem, let's
understand how tokenization takes place.
For the expression 1+21*3.2, the individual characters scanned will appear as eight
separate values, 1, +, 2, 1, *, 3, ., 2.
Building the tokenizer 37
src/parsemath/[Link]
Note that we have changed the type of the expr field from a string slice (&str) to an
iterator type (Chars). Chars is an iterator over the characters of a string slice. This will
allow us to do iterations on expr such as [Link](), which will give the value of the
next character in the expression. But we also need to take a peek at the character following
the next character in the input expression, for reasons we mentioned earlier.
For this, the Rust standard library has a struct called Peekable , which has a peek()
method. The usage of peek() can be illustrated with an example. Let's take the arithmetic
expression 1+2:
Because we will store this expression in the expr field of Tokenizer, which is of the
peekable iterator type, we can perform next() and peek() methods on it in
sequence, as shown here:
To enable such an iteration operation, we will define our Tokenizer struct as follows:
src/parsemath/[Link]
use std::iter::Peekable;
use std::str::Chars;
pub struct Tokenizer {
expr: Peekable<Chars>
}
We are still not done with the Tokenizer struct. The earlier definition would throw
a compiler error asking to add a lifetime parameter. Why is this?, you may ask.
Structs in Rust can hold references. But Rust needs explicit lifetimes to be specified when
working with structs that contain references. That is the reason we get the compiler error
on the Tokenizer struct. To fix this, let's add lifetime annotation:
src/parsemath/[Link]
You can see that the Tokenizer struct has been given a lifetime annotation of 'a.
We have done this by declaring the name of the generic lifetime parameter 'a inside angle
brackets after the name of the struct. This tells the Rust compiler that any reference to the
Tokenizer struct cannot outlive the reference to the characters it contains.
Building the tokenizer 39
Lifetimes in Rust
In system languages such as C/C++, operations on references can lead to
unpredictable results or failures, if the value associated with the reference has
been freed in memory.
In Rust, every reference has a lifetime, which is the scope for which the lifetime
is valid. The Rust compiler (specifically, the borrow checker) verifies that the
lifetime of the reference is not longer than the lifetime of the underlying value
pointed to by the reference.
How does the compiler know the lifetime of references? Most of the time, the
compiler tries to infer the lifetime of references (called elision). But where this
is not possible, the compiler expects the programmer to annotate the lifetime of
the reference explicitly. Common situations where the compiler expects explicit
lifetime annotations are in function signatures where two or more arguments
are references, and in structs where one or more members of the struct are
reference types.
More details can be found in the Rust documentation, at [Link]
[Link]/1.9.0/book/[Link].
We've seen so far how to define the Tokenizer struct, which contains the reference
to input arithmetic expression. We will next take a look at how to represent the tokens
generated as output from the Tokenizer.
To be able to represent the list of tokens that can be generated, we have to first consider
the data type of these tokens. Since the tokens can be of the Num type or one of the
operator types, we have to pick a data structure that can accommodate multiple data
types. The data type options are tuples, HashMaps, structs, and enums. If we add the
constraint that the type of data in a token can be one of many predefined variants
(allowed values), that leaves us with just one option—enums. We will define the tokens
using the enum data structure.
The representation of tokens in the enum data structure is shown in the following
screenshot:
Now that we have defined the data structures to capture the input (arithmetic expression)
and outputs (tokens) for the Tokenizer module, we now can write the code to do the
actual processing.
The following screenshot shows the full design of the Tokenizer module:
src/parsemath/[Link]
impl<'a> Tokenizer<'a> {
pub fn new(new_expr: &'a str) -> Self {
Tokenizer {
expr: new_expr.chars().peekable(),
}
}
}
You'll notice that we are declaring a lifetime for Tokenizer in the impl line. We are
repeating 'a twice. Impl<'a> declares the lifetime 'a, and Tokenizer<'a> uses it.
Building the tokenizer 43
Observations on lifetimes
You've seen that for Tokenizer, we declare its lifetime in three places:
1) The declaration of the Tokenizer struct
2) The declaration of the impl block for the Tokenizer struct
3) The method signature within the impl block
This may seem verbose, but Rust expects us to be specific about lifetimes
because that's how we can avoid memory-safety issues such as dangling pointers
or use-after-free errors.
The impl keyword allows us to add functionality to the Tokenizer struct. The new()
method accepts a string slice as a parameter that contains a reference to the arithmetic
expression input by the user. It constructs a new Tokenizer struct initialized with the
supplied arithmetic expression, and returns it from the function.
Note that the arithmetic expression is not stored in the struct as a string slice, but as
a peekable iterator over the string slice.
In this code, new_expr represents the string slice, new_expr.chars() represents an
iterator over the string slice, and new_expr.chars().peekable() creates a peekable
iterator over the string slice.
The difference between a regular iterator and peekable iterator is that in the former,
we can consume the next character in the string slice using the next() method, while
in the latter we can also optionally peek into the next character in the slice without
consuming it. You will see how this works as we write the code for the next() method
of the Tokenizer.
We will write the code for the next() method on the Tokenizer by implementing the
Iterator trait on the Tokenizer struct. Traits enable us to add behaviors to structs
(and enums). The Iterator trait in the standard library (std::iter::Iterator)
has a method that is required to be implemented with the following signature:
The method signature specifies that this method can be called on an instance of the
Tokenizer struct and it returns Option<Token>. This means that it either returns
Some(Token) or None.
44 A Tour of the Rust Programming Language
Here is the code to implement the Iterator trait on the Tokenizer struct:
src/parsemath/[Link]
match next_char {
Some('0'..='9') => {
let mut number = next_char?.to_string();
Some(Token::Num([Link]::<f64>().
unwrap()))
},
Some('+') => Some(Token::Add),
Some('-') => Some(Token::Subtract),
Some('*') => Some(Token::Multiply),
Some('/') => Some(Token::Divide),
Some('^') => Some(Token::Caret),
Some('(') => Some(Token::LeftParen),
Some(')') => Some(Token::RightParen),
None => Some(Token::EOF),
Building the tokenizer 45
Let's understand stepwise what happens when the next() method is called on
Tokenizer:
• The calling program instantiates the Tokenizer struct first by calling the
new() method, and then invokes the next() method on it. The next() method
on the Tokenizer struct reads the next character in the stored arithmetic
expression by calling next() on the expr field, which returns the next character
in the expression.
• The returned character is then evaluated using a match statement. Pattern
matching is used to determine what token to return, depending on what character
is read from the string slice reference in the expr field.
• If the character returned from string slice is an arithmetic operator (+, -, *, /, ^) or if
it is a parenthesis, the appropriate Token from the Token enum is returned. There
is a one-to-one correspondence between the character and Token here.
• If the character returned is a number, then there is some additional processing
needed. The reason is, a number may have multiple digits. Also, a number may
be decimal, in which case it could be of the form [Link], where the amounts of
digits before and after the decimal are completely unpredictable. So, for numbers,
we should use the peekable iterator on the arithmetic expression to consume
the next character and peek into the character after that to determine whether to
continue reading the number.
The complete code for the Tokenizer can be found in the [Link] file in the
code folder on GitHub.
46 A Tour of the Rust Programming Language
• Number(2.0)
• Number(3.0)
• Multiply(Number(2.0),Number(3.0))
• Number(6.0)
• Add(Multiply(Number(2.0),Number(3.0)),Number(6.0))
Building the parser 47
Each of these nodes is stored in a boxed data structure, which means the actual data value
for each node is stored in the heap memory, while the pointer to each of the nodes is
stored in a Box variable as part of the Node enum.
The overall design of the Parser struct is as follows:
Parser methods
The Parser struct will have two public methods:
• new(): To create a new instance of the parser. This new() method will create
a tokenizer instance passing in the arithmetic expression, and then stores the first
token (returned from Tokenizer) in its current_token field.
• parse(): To generate the AST (the node tree) from the tokens, which is the main
output of the parser.
Here is the code for the new() method. The code is self-explanatory, it creates a new
instance of Tokenizer, initializing it with the arithmetic expression, and then tries
to retrieve the first token from the expression. If successful, the token is stored in the
current_token field. If not, ParseError is returned:
src/parsemath/[Link]
The following is the code for the public parse() method. It invokes a private generate_
ast() method that does the processing recursively and returns an AST (a tree of nodes). If
successful, it returns the Node tree; if not, it propagates the error received:
src/parsemath/[Link]
The following image lists all the private and public methods in the Parser struct:
Let's now look at the code for the get_next_token() method. This method
retrieves the next token from the arithmetic expression using the Tokenizer struct
and updates the current_token field of the Parser struct. If unsuccessful, it
returns ParseError:
src/parsemath/[Link]
src/parsemath/[Link]
Let's now look at the remaining three private methods that do the bulk of the
parser processing.
The parse_number() method takes the current token, and checks for three things:
src/parsemath/[Link]
return Ok(Node::Multiply(Box::new(expr),
Box::new(right)));
}
Ok(expr)
}
_ => Err(ParseError::UnableToParse("Unable to
parse".to_string())),
}
}
The generate_ast() method is the main workhorse of the module and is invoked
recursively. It does its processing in the following sequence:
src/parsemath/[Link]
left_expr = right_expr;
}
Ok(left_expr)
}
We have seen the various methods associated with the parser. Let's now look at another
key aspect when dealing with arithmetic operators—operator precedence.
Operator precedence
Operator precedence rules determine the order in which the arithmetic expression is
processed. Without defining this correctly, we will not be able to calculate the right computed
value of the arithmetic expression. The enum for operator precedence is as follows:
The precedence order increases from top to bottom, that is, DefaultZero < AddSub <
MulDiv < Power < Negative.
Define the operator precedence enum as shown:
src/parsemath/[Link]
src/parsemath/[Link]
impl Token {
pub fn get_oper_prec(&self) -> OperPrec {
use self::OperPrec::*;
use self::Token::*;
match *self {
Add | Subtract => AddSub,
Multiply | Divide => MulDiv,
Caret => Power,
_ => DefaultZero,
}
}
}
Building the parser 55
Now, let's look at the code for convert_token_to_node(). This method basically
constructs the operator-type AST nodes by checking whether the token is Add,
Subtract, Multiply, Divide, or Caret. In the case of an error, ParseError
is returned:
src/parsemath/[Link]
Token::Caret => {
self.get_next_token()?;
//Get right-side expression
let right_expr = self.generate_ast
(OperPrec::Power)?;
Ok(Node::Caret(Box::new(left_expr),
Box::new(right_expr)))
}
_ => Err(ParseError::InvalidOperator(format!(
"Please enter valid operator {:?}",
self.current_token
))),
}
}
We will look in detail at error handling later in the chapter in the Dealing with errors
section. The complete code for Parser can be found in the [Link] file in the
GitHub folder for the chapter.
src/parsemath/[Link]
eval(*expr2)?),
Subtract(expr1, expr2) => Ok(eval(*expr1)? –
eval(*expr2)?),
Multiply(expr1, expr2) => Ok(eval(*expr1)? *
eval(*expr2)?),
Divide(expr1, expr2) => Ok(eval(*expr1)? /
eval(*expr2)?),
Negative(expr1) => Ok(-(eval(*expr1)?)),
Caret(expr1, expr2) => Ok(eval(*expr1)?
.powf(eval(*expr2)?)),
}
}
Trait objects
In the eval() method, you will notice that the method returns Box<dyn
error::Error> in case of errors. This is an example of a trait object.
We will explain this now.
In the Rust standard library, error:Error is a trait. Here, we are telling the
compiler that the eval() method should return something that implements
the Error trait. We don't know at compile time what the exact type being
returned is; we just know that whatever is returned will implement the Error
trait. The underlying error type is only known at runtime and is not statically
determined. Here, dyn error::Error is a trait object. The use of the
dyn keyword indicates it is a trait object.
When we use trait objects, the compiler does not know at compile time which
method to call on which types. This is only known at runtime, hence it is called
dynamic-dispatch (when the compiler knows what method to call at compile
time, it is called static dispatch).
Note also that we are boxing the error with Box<dyn error::Error>.
This is because we don't know the size of the error type at runtime, so boxing
is a way to get around this problem (Box is a reference type that has a known
size at compile time). The Rust standard library helps in boxing our errors by
having Box implement conversion from any type that implements the Error
trait into the trait object Box<Error>.
More details can be found in the Rust documentation, at [Link]
[Link]/book/[Link].
58 A Tour of the Rust Programming Language
Result<T, E> is an enum with two variants, where Ok(T) represents success and
Err(E) represents the error returned. Pattern matching is used to handle the two types of
return values from a function.
To gain greater control over error handling and to provide more user-friendly errors
for application users, it is recommended to use a custom error type that implements the
std::error::Error trait. All types of errors from different modules in the program
can then be converted to this custom error type for uniform error handling. This is a very
effective way to deal with errors in Rust.
A lightweight approach to error handling could be to use Option<T> as the return value
from a function, where T is any generic type:
The Option type is an enum with two variants, Some(T) and None. If processing is
successful, a Some(T) value is returned, otherwise, None is returned from the function.
We will use both the Result and Option types for error handling in our project.
Dealing with errors 59
• Tokenizer module: This has two public methods—new() and next(). The
new() method is fairly simple and just creates a new instance of the Tokenizer
struct and initializes it. No error will be returned in this method. However, the
next() method returns a Token, and if there is any invalid character in the
arithmetic expression, we need to deal with this situation and communicate it
to the calling code. We will use a lightweight error handling approach here,
with Option<Token> as the return value from the next() method. If a valid
Token can be constructed from the arithmetic expression, Some(Token) will
be returned. In the case of invalid input, None will be returned. The calling
function can then interpret None as an error condition and take care of the
necessary handling.
60 A Tour of the Rust Programming Language
• AST module: This has one main eval() function that computes a numeric value
given a node tree. We will return a vanilla std::error::Error in case of an
error during processing, but it will be a Boxed value because otherwise, the Rust
compiler will not know the size of the error value at compile time. The return type
from this method is Result<f64, Box<dyn error::Error>>. If processing
is successful, a numeric value (f64) is returned, else a Boxed error is returned.
We could have defined a custom error type for this module to avoid the complex
Boxed error signature, but this approach has been chosen to showcase the various
ways to do error handling in Rust.
• Token module: This has one function, get_oper_prec(), which returns the
operator precedence given an arithmetic operator as input. Since we do not see any
possibility of errors in this simple method, there will be no error type defined in the
return value of the method.
• Parser module: The Parser module contains the bulk of the processing logic.
Here, a custom error type, ParseError, will be defined, which has the
following structure:
src/parsemath/[Link]
#[derive(Debug)]
pub enum ParseError {
UnableToParse(String),
InvalidOperator(String),
}
src/parsemath/[Link]
Since ParseError will be the main error type returned from processing, and because
the AST module returns a Boxed error, we can write code to automatically convert any
Boxed error from the AST module into ParseError that gets returned by Parser.
The code is as follows:
src/parsemath/[Link]
This concludes the discussion on the arithmetic expression evaluator modules. In the next
section, we will take a look at how to call this module from a main() function.
1. Display prompts with instructions for the user to enter an arithmetic expression.
2. Accept an arithmetic expression in the command-line input from the user.
3. Instantiate Parser (returns a Parser object instance).
4. Parse the expression (returns the AST representation of the expression).
5. Evaluate the expression (computes the mathematical value of the expression).
6. Display the result to the user in the command-line output.
7. Invoke Parser and evaluate the mathematical expression.
src/[Link]
fn main() {
println!("Hello! Welcome to Arithmetic expression
evaluator.");
println!("You can calculate value for expression such as
2*3+(4-5)+2^3/4. ");
println!("Allowed numbers: positive, negative and
decimals.");
println!("Supported operations: Add, Subtract, Multiply,
Putting it all together 63
The main() function displays a prompt to the user, reads a line from stdin (the
command line), and invokes the evaluate() function. If the computation is successful,
it displays the computed AST and the numerical value. If unsuccessful, it prints an
error message.
The code for the evaluate() function is as follows:
src/[Link]
Ok(ast::eval(ast)?)
}
64 A Tour of the Rust Programming Language
The evaluate() function instantiates a new Parser with the provided arithmetic
expression, parses it, and then invokes the eval() method on the AST module. Note the
use of the ? operator for automated propagation of any processing errors to the main()
function, where they are handled with a println! statement.
Run the following command to compile and run the program:
cargo run
You can try out various combinations of positive and negative numbers, decimals,
arithmetic operators, and optional sub-expressions in parentheses. You can also check
how an invalid input expression will produce an error message.
You can expand this project to add support for mathematical functions such as square
roots, trigonometric functions, logarithmic functions, and so on. You can also add
edge cases.
With this, we conclude the first full-length project in this book. I hope this project has
given you an idea not just of how idiomatic Rust code is written, but also of how to think
in Rust terms while designing a program.
The complete code for the main() function can be found in the [Link] file in the
GitHub folder for this chapter.
Summary
In this chapter, we built a command-line application from scratch in Rust, without
using any third-party libraries, to compute the value of the arithmetic expressions.
We covered many basic concepts in Rust, including data types, how to model and design
an application domain with Rust data structures, how to split code across modules and
integrate them, how to structure code within a module as functions, how to expose
module functions to other modules, how to do pattern matching for elegant and safe code,
how to add functionality to structs and enums, how to implement traits and annotate
lifetimes, how to design and propagate custom error types, how to box types to make data
sizes predictable for the compiler, how to construct a recursive node tree and navigate it,
how to write code that recursively evaluates an expression, and how to specify lifetime
parameters for structs.
Congratulations if you successfully followed along and got some working code! If you had
any difficulties, you can refer to the final code in the GitHub repository.
Summary 65
This example project establishes a strong foundation from which to dig into the details
of system programming in the upcoming chapters. If you haven't fully understood every
detail of the code, there is no reason to fret. We will be writing a lot more code and
reinforcing the concepts of idiomatic Rust code as we go along in the coming chapters.
In the next chapter, we will cover the Rust standard library, and see how it supports a rich
set of built-in modules, types, traits, and functions to perform systems programming.
3
Introduction to
the Rust Standard
Library
In the previous chapter, we built a command-line tool using various Rust language
primitives and modules from the Rust Standard Library. However, in order to fully exploit
the power of Rust, it is imperative to understand the breadth of what features are available
within the standard library for system programming tasks, without having to reach out to
third-party crates.
In this chapter, we'll deep-dive into the structure of the Rust Standard Library. You'll get
an introduction to the standard modules for accessing system resources and learn how
to manage them programmatically. With the knowledge gained, we will implement
a tiny portion of a template engine in Rust. By the end of this chapter, you will be able to
confidently navigate the Rust Standard Library and make use of it in your projects.
68 Introduction to the Rust Standard Library
The following are the key learning outcomes for this chapter:
Technical requirements
Rustup and Cargo must be installed in your local development environment. The
GitHub repository for the examples in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter03.
• Kernel: The kernel is the central component of an operating system that manages
system resources such as memory, disk and file systems, CPU, network, and other
devices such as the mouse, keyboard, and monitors. User programs (for example,
a command-line tool or text editor) cannot manage system resources directly. They
have to rely on the kernel to perform operations. If a text editor program wants
to read a file, it will have to make a corresponding system call, read(), which
the kernel will then execute on behalf of the editor program. The reason for this
restriction is that modern processor architectures (such as x86-64) allow the CPU
to operate at two different privilege levels—kernel mode and user mode. The user
mode has a lower level of privilege than the kernel mode. The CPU can perform
certain operations only while running in the kernel mode. This design prevents
user programs from accidentally doing tasks that could adversely affect the
system operation.
70 Introduction to the Rust Standard Library
• System call (syscall) interface: The kernel also provides a system call application
programming interface that acts as the entry point for processes to request the kernel
to perform various tasks.
• Syscall wrapper APIs: A user program cannot directly make a system call in the
way normal functions are called because they cannot be resolved by the linker. So,
architecture-specific assembly language code is needed to make system calls into the
kernel. Such code is made available through wrapper libraries, which are platform-
specific. For Unix/Linux/POSIX systems, this library is libc (or glibc). For the
Windows operating system, there are equivalent APIs.
• Rust Standard Library: The Rust Standard Library is the primary interface for
Rust programs into the kernel functions of an operating system. It uses libc (or
another platform-specific equivalent library) internally to invoke system calls. The
Rust Standard Library is cross-platform, which means that the details of how system
calls are invoked (or which wrapper libraries are used) are abstracted away from the
Rust developer. There are ways to invoke system calls from Rust code without using
the standard library (for example, in embedded systems development), but that is
beyond the scope of this book.
• User space programs: These are the programs that you will write as part of this
book using the standard library. The arithmetic expression evaluator you wrote in the
previous chapter is an example of this. In this chapter, you will learn how to write
a feature of the template engine using the standard library, which is also a user space
program.
Note
Not all modules and functions within the Rust Standard Library invoke system
calls (for example, there are methods for string manipulation, and to handle
errors). As we go through the standard library, it is important to remember this
distinction.
Let's now begin our journey to understand and start using the Rust Standard Library.
Exploring the Rust Standard Library 71
The following figure shows a high-level view of the Rust standard library:
• Rust language primitives, which contain basic types such as signed and unsigned,
integers, bool, floating point, char, array, tuple, slice, and string. Primitives are
implemented by the compiler. The Rust Standard Library includes the primitives
and builds on top of them.
• The core crate is the foundation of the Rust Standard Library. It acts as the link
between the Rust language and the standard library. It provides types, traits,
constants, and functions implemented on top of Rust primitives, and provides
the foundational building blocks for all Rust code. The core crate can be used
independently, is not platform-specific, and does not have any links to operating
system libraries (such as libc) or other external dependencies. You can instruct
the compiler to compile without the Rust Standard Library and use the core
crate instead (such an environment is called no_std in Rust parlance, which
is annotated with the #![no_std] attribute), and this is used commonly in
embedded programming.
• The alloc crate contains types, functions, and traits related to memory allocation
for heap-allocated values. It includes smart pointer types such as Box (Box<T>),
reference-counted pointers (Rc<T>), and atomically reference-counted pointers
(Arc<T>). It also includes and collections such as Vec and String (note that
String is implemented in Rust as a UTF-8 sequence). This crate does not need
to be used directly when the standard library is used, as the contents of the alloc
crate are re-exported and made available as part of the std library. The only
exception to this rule is when developing in a no_std environment, when this
crate can be directly used to access its functionality.
• Modules (libraries) that are directly part of the standard library (and not
re-exported from core or alloc crates) include rich functionality for operations
around concurrency, I/O, file system access, networking, async I/O, errors, and
OS-specific functions.
In this book, we will not directly work with the core or alloc crates, but use the Rust
Standard Library modules that are a higher-level abstraction over these crates.
We will now analyze the key modules within the Rust Standard Library with a focus on
systems programming. The standard library is organized into modules. For example, the
functionality that enables user programs to run on multiple threads for concurrency is in
the std::thread module, and the Rust constructs for dealing with synchronous I/O
are in the std::io module. Understanding how the functionality within the standard
library is organized across modules is a critical part of being an effective and productive
Rust programmer.
Exploring the Rust Standard Library 73
Figure 3.3 shows the layout of the standard library modules organized into groups:
• Syscalls-oriented: These are modules that either manage system hardware resources
directly or require the kernel for other privileged operations.
• Computation-oriented: These are the modules that are oriented towards data
representation, computation, and instructions to the compiler.
74 Introduction to the Rust Standard Library
Figure 3.4 shows the same module grouping as in Figure 3.3 but segregated as
Syscalls-oriented or Computation-oriented. Note that this may not be a perfect
classification as not all methods in all modules marked in the Syscalls-oriented category
involve actual system calls. But this classification can serve as a guide to find our way
around the standard library:
Computation-oriented modules
The standard library modules in this section deal mostly with programming constructs that
deal with data processing, data modeling, error handling, and instructions to the compiler.
Some of the modules may have functionality that overlaps with the syscalls-oriented
category, but this grouping is based on the primary focus of each module.
Data types
The modules related to data types and structures in the Rust Standard Library are
mentioned in this section. There are broadly two categories of data types in Rust. The
first group comprises primitive types such as integers (signed, unsigned), floating points,
and char, which are a core part of the language and compiler and the standard library
adds additional functionality to those types. The second group consists of higher-level
data structures and traits such as vectors and strings, which are implemented within the
standard library. Modules from both these groups are listed here:
• any: This can be used when the type of the value passed to a function is not known
at compile time. Runtime reflection is used to check the type and perform suitable
processing. An example of using this would be in the logging function, where we
want to customize what is logged depending on the data type.
• array: It contains utility functions such as comparing arrays, implemented over
the primitive array type. Note that Rust arrays are value types, that is, they are
allocated on the stack, and have a fixed length (not growable).
• char: This contains utility functions implemented over the char primitive type,
such as checking for digits, converting to uppercase, encoding to UTF-8, and so on.
• collections: This is Rust's standard collection library, which contains efficient
implementations of common collection data structures used in programming.
Collections in this library include Vectors, LinkedLists, HashMaps,
HashSet, BTtreeMap, BTreeSet, and BinaryHeap.
• f32, f64: This library provides constants specific to floating point implementations
of the f32 and f64 primitive types. Examples of constants are MAX and MIN, which
provide the maximum and minimum value of floating point numbers that can be
stored by f32 and f64 types.
• i8, i16, i32, i64, i128: Signed integer types of various sizes. For example, i8
represents a signed integer of length 8 bits (1 byte) and i128 represents a signed
integer of length 128 bits (16 bytes).
76 Introduction to the Rust Standard Library
• u8, u16, u32, u64, u128: Unsigned integer types of various sizes. For example,
u8 represents an unsigned integer of length 8 bits (1 byte) and u128 represents an
unsigned integer of length 128 bits (16 bytes).
• isize, usize: Rust has two data types, isize and usize, that correspond to
signed and unsigned integer types. The uniqueness of these types is that their size is
dependent on whether the CPU uses a 32-bit or 64-bit architecture. For example, on
a 32-bit system, the size of the isize and usize data types is 32 bits (4 bytes), and
likewise, for 64-bit systems, their size is 64 bits (8 bytes).
• marker: Basic properties that can be attached to types (in the form of traits) are
described in this module. Examples include Copy (types whose values can be
duplicated by a simple copy of its bits) and Send (thread-safe types).
• slice: Contains structs and methods useful to perform operations such as
iterate and split on slice data types.
• string: This module contains the String type and methods such as to_
string, which allows converting a value to a String. Note that String is
not a primitive data type in Rust. The primitive types in Rust are listed here:
[Link]
• str: This module contains structs and methods associated with string slices such as
iterate and split on str slices.
• vec: This module contains the Vector type, which is a growable array with heap-
allocated contents, and associated methods for operating on vectors such as splicing
and iterating. A vec module is an owned reference and a smart pointer (such as
Box<T>). Note that vec was originally defined in the alloc crate, but was made
available as part of both the std::vec and std::collections modules.
Data processing
This is an assorted collection of modules that provides helper methods for different types
of processing such as dealing with ASCII characters, comparing, ordering, and printing
formatted values, arithmetic operations, and iterators:
• ascii: Most string operations in Rust act on UTF-8 strings and characters. But in
some cases, there may be a need to operate on ASCII characters only. This module
provides operations on ASCII strings and characters.
• cmp: This module contains functions for ordering and comparing values, and
associated macros. For example, implementing the Eq trait contained in this module
allows a comparison of custom struct instances using the == and != operators.
Exploring the Rust Standard Library 77
• fmt: This module contains utilities to format and print strings. Implementing this
trait enables printing any custom data type using the format! macro.
• hash: This module provides functionality to compute a hash of data objects.
• iter: This module contains the Iterator trait, which is part and parcel of
idiomatic Rust code, and a popular feature of Rust. This trait can be implemented by
custom data types for iterating over their values.
• num: This module provides additional data types for numeric operations.
• ops: This module has a set of traits that allow you to overload operators for custom
data types. For example, the Add trait can be implemented for a custom struct and
the + operator can be used to add two structs of that type.
Error handling
This group consists of modules that have functionality for error handling in Rust
programs. The Error trait is the foundational construct to represent errors. Result
deals with the presence or absence of errors in the return value of functions, and Option
deals with the presence or absence of values in a variable. The latter prevents the dreaded
null value error that plagues several programming languages. Panic is provided as a way
to exit the program if errors cannot be handled:
• error: This module contains the Error trait, which represents the basic
expectations of error values. All errors implement the trait Error, and this module
is used to implement custom or application-specific error types.
• option: This module contains the Option type, which provides the ability for
a value to be initialized to either Some value or None value. The Option type can
be considered as a very basic way to handle errors involving the absence of values.
Null values cause havoc in other programming languages in the form of null pointer
exceptions or the equivalent.
• panic: This module provides support to deal with panic including capturing the
cause of panic and setting hooks to trigger custom logic on panic.
• result: This module contains the Result type, which along with the Error
trait and Option type form the foundation of error handling in Rust. Result is
represented as Result<T,E>, which is used to return either values or errors from
functions. Functions return the Result type whenever errors are expected and if
the error is recoverable.
78 Introduction to the Rust Standard Library
Compiler
This group contains modules that are related to the Rust compiler.
• hint: This module contains functions to hint to the compiler about how code
should be emitted or optimized.
• prelude: The prelude is the list of items that Rust automatically imports into each
Rust program. It is a convenience feature.
• primitive: This module re-exports Rust primitive types, normally for use in
macro code.
We've so far seen the computation-oriented modules of the Rust standard library. Let's
take a look at the syscalls-oriented modules now.
Syscalls-oriented modules
While the previous group of modules was related to in-memory computations, this
section deals with operations that involve managing hardware resources or other
privileged operations that normally require kernel intervention. Note that not all methods
in these modules involve system calls to the kernel, but it helps to construct a mental
model at the module level.
Memory management
This grouping contains a set of modules from the standard library that deal with memory
management and smart pointers. Memory management includes static memory allocation
(on the stack), dynamic memory allocation (on the heap), memory deallocation (when a
variable goes out of scope, its destructor is run), cloning or copying values, managing raw
pointers and smart pointers (which are pointers to data on the heap), and fixing memory
locations for objects so that they cannot be moved around (which is needed for special
situations). The modules are as follows:
• alloc: This module contains APIs for the allocation and deallocation of
memory, and to register a custom or third-party memory allocator as the
standard library's default.
Exploring the Rust Standard Library 79
• pin: Types in Rust are movable, by default. For example, on a Vec type, a pop()
operation moves a value out and a push operation may result in the reallocation
of memory. However, there are situations where it is useful to have objects that
have fixed memory locations and do not move. For example, self-referencing data
structures such as linked lists. For such cases, Rust provides a data type that pins
data to a location in memory. This is achieved by wrapping a type in the pinned
pointer, Pin<P>, which pins the value P in its place in memory.
• ptr: Working with raw pointers in Rust is not common, and is used only in selective
use cases. Rust allows working with raw pointers in unsafe code blocks, where
the compiler does not take responsibility for memory safety and the programmer
is responsible for memory-safe operations. This module provides functions to
work with raw pointers. Rust supports two types of raw pointers—immutable (for
example, *const i32) and mutable (for example, *mut i32). Raw pointers have
no restrictions on how they are used. They are the only pointer type in Rust that can
be null, and there is no automatic dereferencing of raw pointers.
• rc: This module provides single-threaded reference-counting pointers, where rc
stands for reference-counted. A reference-counted pointer to an object of type T can
be represented as Rc<T>. Rc<T> provides shared ownership of value T, which is
allocated in the heap. If a value of this type is cloned, it returns a new pointer to the
same memory location in the heap (does not duplicate the value in memory). This
value is retained until the last Rc pointer that references this value is in existence,
after which the value is dropped.
Concurrency
This category groups modules related to synchronous concurrent processing. Concurrent
programs can be designed in Rust by spawning processes, spawning threads within a
process, and having ways to synchronize and share data across threads and processes.
Asynchronous concurrency is covered under the Async group.
• process: This module provides functions for dealing with processes including
spawning a new process, handling I/O, and terminating processes.
• sync: The sequence of instructions executed in a Rust program may vary in cases
where concurrency is involved. In such cases, there may be multiple threads of
execution in parallel (for example, multiple threads in a multi-core CPU), in which
case synchronization primitives are needed to coordinate operations across threads.
This module includes synchronization primitives such as Arc, Mutex, RwLock,
and Condvar.
• thread: Rust's threading model consists of native OS threads. This module
provides functionality to work with threads such as spawning new threads, and
configuring, naming, and synchronizing them.
File system
This contains two modules that deal with filesystem operations. The fs module deals
with methods for working with and manipulating the contents of the local file system.
The path module provides methods to navigate and manipulate directory and file system
paths programmatically:
• fs: This module contains operations to work with and manipulate file systems.
Note that operations in this module can be used cross-platform. Structs and
methods in this module deal with files, naming, file types, directories, file metadata,
permissions, and iterating over entries in a directory.
• path: This module provides the types PathBuf and Path for working with and
manipulating paths.
Input-Output
This contains the io module, which provides core I/O functionality. The io module
contains common functions that are used while dealing with inputs and outputs. This
includes reading and writing to I/O types, such as files or TCP streams, buffered reads and
writes for better performance, and working with standard input and output.
Networking
The core networking functionality is provided by the net module. This module contains
the primitives for TCP and UDP communications and for working with ports and sockets.
82 Introduction to the Rust Standard Library
OS-specific
The OS-specific functions are provided in the os module. This module contains
platform-specific definitions and extensions for the Linux, Unix, and Windows
operating systems.
Time
The time module provides functions to work with system time. This module
contains structs to deal with system time and to compute durations, typically used
for system timeouts.
Async
Asynchronous I/O functionality is provided by the future and task modules:
• future: This contains the Future trait that serves as the foundation for building
asynchronous services in Rust.
• task: This module provides functions needed to work with asynchronous tasks
including Context, Waker, and Poll.
This concludes the overview of the Rust Standard Library modules. The Rust Standard
Library is vast and is rapidly evolving. It is highly recommended that you review the
official documentation at [Link] with
the understanding gained in this chapter, for specific methods, traits, data structures, and
example snippets.
Let's now move on to the next section where we will put this knowledge to use by writing
some code.
Figure 3.5 shows the process involved in generating HTML with a template engine:
• The static HTML includes the bank name, logo, other branding, and content that is
common to all users.
• The dynamic portion of the web page contains the actual list of past transactions
for the logged-in user. The transaction list varies from user to user.
• A frontend (web) designer can author the static HTML with sample data using web
design tools.
• A template designer would convert the static HTML into an HTML template
embedding the metadata for the dynamic portions of the page in specific syntax.
• At runtime (when the page request comes into the server), the template engine takes
the template file from the specified location, applies the transaction list for the
logged-in user from the database, and generates the final HTML page.
Building a template engine 85
Figure 3.6 shows a sample template and the HTML generated from the template engine:
• On the left-hand side, a sample template file is shown. The template file is a mix
of static and dynamic content. An example of static content is <h1> Welcome
to XYZ Bank </h1>. An example of dynamic content is <p> Welcome
{{name}} </p>, because the value for name will be substituted at runtime. There
are three types of dynamic content shown in the template file – an if tag, a for tag,
and a template variable.
• In the middle of the figure, we can see the template engine with two sources of
inputs – template file and data source. The template engine takes these inputs and
generates the output HTML file.
Building a template engine 87
Figure 3.7 explains the working of the template engine using an example:
• Parser
• HTML generator
Let's start by understanding the steps involved in HTML generation using the
template engine.
88 Introduction to the Rust Standard Library
The template file contains a set of statements. Some of these are static literals while others
are placeholders for dynamic content represented using special syntax. The template
engine reads each statement from the template file. Let's call each line read as a template
string, henceforth. The process flow begins with the template string read from the
template file:
1. The template string is fed to the parser. The template string in our example is
<p> Welcome {{name}} </p>.
2. The parser first determines the type of template string, which is called tokenizing.
Let's consider three types of tokens – if tags, for tags, and template variables. In
this example, a token of type template variable is generated (if the template string
contains a static literal, it is written to the HTML output without any changes).
3. Then the template string is parsed into a static literal, Welcome, and a template
variable {{name}}.
4. The outputs of the parser (from steps 2 and 3) are passed to the HTML generator.
5. Data from a data source is passed as context by the template engine to the generator.
6. The parsed token and strings (from steps 2 and 3) are combined with the context
data (from step 5) to produce the result string, which is written to the output
HTML file.
The preceding steps are repeated for every statement (template string) read from the
template file.
We cannot use the parser we created for arithmetic parsing in Chapter 2, A Tour of the
Rust Programming Language, for this example, as we need something specific for the
HTML template language syntax. We could use the general-purpose parsing libraries
(for example, nom, pest, and lalrpop are a few popular parsing libraries in Rust), but
for this book, we will custom-build a template parser. The reason for this approach is that
each parsing library has its own API and grammar that we need to familiarize ourselves
with. Doing that would deviate from the goal of this book, which is learning to write
idiomatic code in Rust from the first principles.
First, let's create a new library project with the following:
The src/[Link] file (which is automatically created by the cargo tool) will contain all
the functionality of the template engine.
Create a new file, src/[Link]. The main() function will be placed in this file.
Building a template engine 89
Let's now design the code structure for the template engine. Figure 3.8 shows the
detailed design:
Data structures
ContentType is the main data structure to classify the template string read from the
template file. It is represented as enum and contains the list of possible token types read
from the template file. As each statement (template string) is read from the template
file, it is evaluated to check if it is one of the types defined in this enum. The code for
ContentType is as follows:
src/[Link]
Tag(TagType),
Unrecognized,
}
Pay special attention to the annotations PartialEq and Debug. The former is used to
allow content types to be compared, and the latter is used to print the values of the content
to the console.
Derivable traits
The Rust compiler can automatically derive default implementations for
a few traits defined in the standard library. Such traits are called derivable
traits. To instruct the compiler to provide default trait implementations, the
#[derive] attribute is used. Note that this can be done only for types such
as custom structs and enums that you have defined, not for types defined in
other libraries that you don't own.
Types for which trait implementations can be derived automatically include
comparison traits such as Eq, PartialEq, and Ord, and others such as
Copy, Clone, Hash, Default, and Debug.
TagType is a supporting data structure that is used to indicate whether a template string
corresponds to a for-tag (repetitive loop) or if-tag (display control):
src/[Link]
#[derive(PartialEq, Debug)]
pub enum TagType {
ForTag,
IfTag,
}
We will create a struct to store the result of the tokenization of the template string:
src/[Link]
#[derive(PartialEq, Debug)]
pub struct ExpressionData {
pub head: Option<String>,
pub variable: String,
pub tail: Option<String>,
}
Building a template engine 91
Note that head and tail are of type Option<String> to allow for the possibility that
a template variable may not contain static literal text before or after it.
To summarize, the template string is first tokenized as type
ContentType::TemplateVariable(ExpressionData), and ExpressionData
is parsed into head="Hello", variable="name", and tail =",welcome".
Key functions
Let's look at the key functions to implement the template engine:
• Program: main(): This is the starting point of the program. It first calls
functions to tokenize and parse the template string, accepts context data to feed
into the template, and then calls functions to generate the HTML using the parser
outputs and context data.
• Program: get_content_type(): This is the entry point into the parser. It
parses each line of the template file (which we refer to as the template string) and
classifies it as one of the following token types: Literal, Template variable, Tag, or
Unrecognized. The Tag token type can be either a for tag or an if tag. If the token
is of type Template variable, it parses the template string to extract the head, tail,
and template variable.
These types are defined as part of the ContentType enum. Let's write a few
test cases to crystallize what we would like to see as inputs and outputs to this
function, and then look at the actual code for get_content_type(). Let's
take a test-driven development (TDD) approach here.
First, create a tests module by adding the following block of code in
src/[Link]:
#[cfg(test)]
mod tests {
use super::*;
}
Place the unit tests within this tests module. Each test will begin with the
annotation #[test].
92 Introduction to the Rust Standard Library
src/[Link]
#[test]
fn check_literal_test() {
let s = "<h1>Hello world</h1>";
assert_eq!(ContentType::Literal(s.to_string()),
get_content_type(s));
}
This test case is to check whether the literal string stored in variable s is tokenized as
ContentType::Literal(s).
Test case 2: To check if the content type is of the template variable type:
src/[Link]
#[test]
fn check_template_var_test() {
let content = ExpressionData {
head: Some("Hi ".to_string()),
variable: "name".to_string(),
tail: Some(" ,welcome".to_string()),
};
assert_eq!(
ContentType::TemplateVariable(content),
get_content_type("Hi {{name}} ,welcome")
);
}
For the Template String token type, this test case checks to see if the expression
in the template string is parsed into the head, variable, and tail components,
and successfully returned as type ContentType::TemplateVariable
(ExpressionData).
Building a template engine 93
src/[Link]
#[test]
fn check_for_tag_test() {
assert_eq!(
ContentType::Tag(TagType::ForTag),
get_content_type("{% for name in names %}
,welcome")
);
}
This test case is to check if a statement containing a for tag is tokenized successfully as
ContentType::Tag(TagType::ForTag).
Test case 4 – To check if the content contains IfTag:
src/[Link]
#[test]
fn check_if_tag_test() {
assert_eq!(
ContentType::Tag(TagType::IfTag),
get_content_type("{% if name == 'Bob' %}")
);
}
• for tags are enclosed by {% and %} and contain the for keyword.
• if tags are enclosed by {% and %} and contain the if keyword.
• Template variables are enclosed by {{ and }}.
Based on these rules, the statement is parsed and the appropriate token is returned – a
for tag, an if tag, or a template variable.
Here is the complete code listing for the get_content_type() function:
src/[Link]
return_val = ContentType::Unrecognized;
}
return_val
}
Supporting functions
Let's now talk about supporting functions. The parser utilizes these supporting functions
to perform operations such as checking for the presence of a substring within a string,
checking for matching pairs of braces, and so on. They are needed to check whether
the template string is syntactically correct, and also to parse the template string into its
constituent parts. Before writing some more code, let's look at the test cases for these
supporting functions to understand how they will be used, and then see the code. Note
that these functions are designed to enable reuse across projects. All supporting functions
are placed in src/[Link]:
The standard library provides a straightforward way to check for a substring within
a string slice.
• check_matching_pair(): This function checks for matching symbol strings.
Here is the test case:
#[test]
fn check_symbol_pair_test() {
assert_eq!(true, check_matching_pair(
"{{Hello}}", "{{", "}}"));
}
Building a template engine 97
In this test case, we pass matching tags, '{{' and '}}', to this function, and check
if both are contained within another string expression, "{{Hello}}".
Here is the code for the function:
pub fn check_matching_pair(input: &str, symbol1: &str,
symbol2: &str) -> bool {
[Link](symbol1) && [Link](symbol2)
}
In this function, we are checking if the two matching tags are contained within the
input string.
• get_expression_data(): This parses an expression with a template variable,
parses it into head, variable, and tail components, and returns the results.
Here is the test case for this function:
#[test]
fn check_get_expression_data_test() {
let expression_data = ExpressionData {
head: Some("Hi ".to_string()),
variable: "name".to_string(),
tail: Some(" ,welcome".to_string()),
};
assert_eq!(expression_data,
get_expression_data("Hi {{name}}
,welcome"));
}
ExpressionData {
head: Some(head),
variable: variable,
98 Introduction to the Rust Standard Library
tail: Some(tail),
}
}
We see the code for the function in the following snippet. This makes use of the
char_indices() method on the slice available as part of the standard library,
and converts the input string into an iterator that is capable of tracking indices. We
then iterate over the input string and return the index of the symbol when found:
pub fn get_index_for_symbol(input: &str, symbol: char)
-> (bool, usize) {
let mut characters = input.char_indices();
let mut does_exist = false;
let mut index = 0;
while let Some((c, d)) = [Link]() {
if d == symbol {
does_exist = true;
index = c;
break;
}
}
(does_exist, index)
}
This concludes the code for the Parser module. Let's now look at the main function that
ties all the pieces together.
Building a template engine 99
• Pass context data: It creates a HashMap to pass values for the template variables
mentioned in the template. We add values for name and city to this HashMap. The
HashMap is passed to the generator function along with the parsed template input:
• Invoke parser and generator: The parser is invoked by the call to the
get_context_data() function for each line of input read from the
command line (standard input).
a) If the line contains template variable, it invokes the HTML generator
generate_html_template_var() to create the HTML output.
b) If the line contains a literal string, it simply echoes back the input HTML
literal string.
c) If the line contains for or if tags, right now, we simply print out a statement
that the feature is not yet implemented. We will implement this in future chapters:
for line in io::stdin().lock().lines() {
match get_content_type(&line?.clone()) {
ContentType::TemplateVariable(content) => {
let html = generate_html_template_var
(content, [Link]());
println!("{}", html);
}
ContentType::Literal(text) => println!
("{}", text),
ContentType::Tag(TagType::ForTag) =>
println!("For Tag not implemented"),
ContentType::Tag(TagType::IfTag) =>
println!("If Tag not implemented"),
ContentType::Unrecognized =>
println!("Unrecognized input"),
}
}
src/[Link]
use std::collections::HashMap;
use std::io;
use std::io::BufRead;
use template_engine::*;
fn main() {
let mut context: HashMap<String, String> =
HashMap::new();
[Link]("name".to_string(), "Bob".to_string());
[Link]("city".to_string(),
"Boston".to_string());
src/[Link]
use std::collections::HashMap;
pub fn generate_html_template_var(
content: ExpressionData,
context: HashMap<String, String>,
) -> String {
let mut html = String::new();
html
}
This function constructs the output html statement consisting of head, text content, and
tail. To construct the text content, the template variables are replaced with the values from
the context data. The constructed html statement is returned from the function.
The complete code from this chapter can be found at [Link]
PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter03.
Building a template engine 103
2. Test for the literal string: You can enter the literal string <h2> Hello,
welcome to my page </h2>. You will see the same string printed out as there
is no transformation to be done.
3. Test for the template variable: Enter a statement with the name or city variable (as
mentioned in the main program) such as <p> My name is {{name}} </p>
or <p> I live in {{city}} </p>. You will see <p> My name is Bob
</p> or <p> I live in Boston </p> printed out corresponding to the
input. This is because we initialized the variable name to Bob and city to Boston
in the main() program. You are encouraged to enhance this code to add support
for two template vars in a single HTML statement.
4. Test for tag and if tag: Enter a statement enclosed within {% and %}, and
containing either the string for or if. You will see one of the following messages
printed out to the terminal: For Tag not implemented or If Tag not
implemented.
You are encouraged to write the code for the for tag and if tag as an exercise.
Ensure to check for the right sequence of symbols. For example, an invalid format
such as {% for }% or %} if {% should be rejected.
Even though we are not able to implement more features of the template engine, in this
chapter, we have seen how to use the Rust Standard Library in a real-life use case. We have
primarily used the io, collections, iter, and str modules from the Rust Standard
Library to implement the code in this chapter. As we go through future chapters, we will
cover more of the standard library.
104 Introduction to the Rust Standard Library
Summary
In this chapter, we reviewed the overall structure of the Rust Standard Library and
classified the modules of the standard library into different categories for better
understanding. You got a brief introduction to the modules in areas of concurrency,
memory management, file system operations, data processing, data types, error handling,
compiler-related, FFI, networking, I/O, OS-specific, and time-related features.
We looked at what a template engine is, how it works, and defined the scope and
requirements of our project. We designed the template engine in terms of Rust data
structures (enum and struct) and Rust functions. We saw how to write code for parsing
templates and to generate HTML for statements involving template variables. We executed
the program providing input data and verified the generated HTML in the terminal
(command line).
In the next chapter, we will take a closer look at the Rust Standard Library modules
that deal with managing process environment, command-line arguments, and
time-related functionality.
Further reading
• Django template language: [Link]
ref/templates/language/
• Rust Standard Library: [Link]
4
Managing
Environment,
Command Line,
and Time
In the previous chapter, we looked at how the Rust Standard Library is structured. We
also wrote a portion of a basic template engine that can generate dynamic HTML page
components given an HTML template and data. From here onward, we will start to
deep-dive into specific modules of the standard library grouped by functional areas.
In this chapter, we will look at Rust Standard Library modules that pertain to working
with system environment, command-line, and time-related functions. The goal of this
chapter is for you to gain more proficiency in working with command-line parameters,
path manipulation, environment variables, and time measurements.
106 Managing Environment, Command Line, and Time
• Writing Rust programs that can discover and manipulate the system environment
and filesystem across Linux, Unix, and Windows platforms
• Creating programs that can use command-line arguments to accept configuration
parameters and user inputs
• Capturing elapsed time between events
These are relevant skills to have for systems programming in Rust. We will learn these
topics in a practical way by developing a command-line application for image processing.
Along the way, we will see more details about the path, time, env, and fs modules of
the Rust Standard Library.
First, let's see what we will be building.
Imagine that we had a tool for bulk image resizing – tool that would look through a
filesystem directory on a desktop or server, pull out all the image files (for instance, .png
and .jpg), and resize all of them to predefined sizes (for example, small, medium, or large).
Think about how helpful such a tool would be for freeing up space on the hard disk, or for
uploading pictures to show in a mobile or web app. We will be building such a tool. Fasten
your seat belts.
Technical requirements 107
Technical requirements
The GitHub repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter04.
Functional requirements
We will build a command-line tool that performs the following two operations:
• Image resize: Resizes one or more images in a source folder to a specified size
• Image stats: Provides some statistics on the image files present in the source folder
108 Managing Environment, Command Line, and Time
Let's name the tool ImageCLI. Figure 4.1 shows the two main features of the tool:
• Size: This is the desired output size of the image. If the user specifies size =
small, the output image will have 200 pixels of width; for size = medium, the
output file will have 400 pixels of width; and for size = large, the output will
have 800 pixels of width. For example, if the input image is a JPG file with a total
size of 8 MB, it can be resized to approximately < 500 KB in size by specifying
size = medium.
• Mode: The mode indicates whether the user wants to resize one image file or
multiple files. The user specifies mode = single for resizing a single file, or
mode = all for resizing all image files in a specified folder.
Project scope and design overview 109
• Source folder: The value specified by the user for the source folder has a different
meaning depending on whether mode = single or mode = all is chosen. For
mode = single, the user specifies the value of srcfolder as the full path of the
image file with its filename. For mode = all, the user specifies, for the value of
srcfolder, the full path of the folder (the one containing the image files) without
any image filenames. For example, if mode = single and srcfolder = /
user/bob/images/[Link] are used, the tool will resize the single image
file of [Link], contained in the /user/bob/images folder. If mode =
all and srcfolder = /user/bob/images are used, the tool will resize all
the image files contained in the /user/bob/images source folder.
For our image stats functionality, users will also be able to specify a srcfolder
containing the image files and get back the number of image files in that folder, along with
the total size of all those image files. For example, if srcfolder=/user/bob/images
is used, the image stats option will give a result similar to the following: The folder
contains 200 image files with total size 2,234 MB.
Non-functional requirements
The following are a list of non-functional (technical) requirements for the project:
• The tool will be packaged and distributed as a binary and it should work on three
platforms: Linux, Unix, and Windows.
• We should be able to measure the time taken to resize the images.
• User inputs for specifying command-line flags must be case-insensitive for ease
of use.
• The tool must be able to display meaningful error messages to the user.
• The core functionality of image resizing must be separate from the command-line
interface (CLI). This way, we have the flexibility of reusing the core functionality
with a desktop graphical interface or as part of a web backend in a web application.
• The project will be organized as a library containing the image processing
functionality and a binary that provides the CLI to read and parse user input,
provide error messages, and display output messages to the user. The binary will
make use of the library for core image processing.
110 Managing Environment, Command Line, and Time
Project structure
Let's create the project skeleton so we can visualize the project structure better. Create
a new lib project using cargo. Let's name the CLI tool as imagecli using the
following command:
1. Under the src folder, create a subfolder called imagix (for image magic!) to host
the library code. Under the imagix subfolder, create four files: [Link], which is
the entry point into the imagix library, [Link] to host the code related to
image resizing, [Link] to host the code for image file statistics, and [Link]
to contain the custom error type and error handling code.
2. Under the src folder, create a new file called [Link], which will contain
the code for the CLI.
In this subsection, we have seen the feature requirements for the tool and the desired
project structure. In the next subsection, we will look at the design for the tool.
Project scope and design overview 111
Technical design
In this subsection, we will look at the high-level design of the tool, primarily focusing on
the image processing feature. We will design the specifics of the CLI in the Developing the
command-line application and testing section.
Our project comprises our reusable imagix library containing the core functionality
for image resizing and statistics, and a binary executable, imagecli, with a CLI. This is
depicted in Figure 4.3:
Image processing is a highly-specialized domain in itself, and it is beyond the scope of this
book to cover the techniques and algorithms involved. Given the complexity and scope
of the image processing domain, we will use a third-party library that will implement the
needed algorithms and provide us with a nice API to call.
For this purpose, we will use the image-rs/image open source crate that is written in
Rust. The crate docs are at the following link: [Link]
Let's look at how we can design the imagix library using the image crate.
The image crate is fully featured and has many image processing functions. We will
however use only a small subset of features for our project. Let's recall our three key
requirements for image processing: the ability to open an image file and load it into
memory, the ability to resize the image to a desired size, and the ability to write the resized
image from memory into a file on the disk. The following methods in the image-rs/
image crate address our needs:
This should be adequate for our image processing requirements in this project. For the
other two concerns around path manipulation and time measurements, we will use the
Rust Standard Library, which is described in the next subsection.
Let's next look at how to perform the directory operations needed for our project.
When the user specifies mode=all, our requirement is to iterate through all the files in
the specified source folder and filter the list of image files for processing. For iterating over
directory paths, we will use the read_dir() function in the std::fs module.
Project scope and design overview 115
use std::fs;
fn main() {
let entries = fs::read_dir("/tmp").unwrap();
for entry in entries {
if let Ok(entry) = entry {
println!("{:?}", [Link]());
}
}
}
This is the code we will use to get entries in a directory and do further processing.
Apart from reading a directory for its contents, we also need to check for the presence of a
tmp subfolder under the source folder and create it if it does not already exist. We will use
the create_dir() method from the std::fs module to create a new subdirectory.
We will see more details of the std::fs module in a later chapter.
Time measurement
For measuring time, we can use the std::time module.
The std::time module in the Rust Standard Library has several time-related functions
including getting the current system time, creating a duration to represent a span of time,
and measuring the time elapsed between two specific time instants. Some examples of
using the time module are provided in the following.
To get the current system time, we can write the following code:
use std::time::SystemTime;
fn main() {
let _now = SystemTime::now();
}
116 Managing Environment, Command Line, and Time
Here is how to get the elapsed time from a given point in time:
use std::thread::sleep;
use std::time::{Duration, Instant};
fn main() {
let now = Instant::now();
sleep(Duration::new(3, 0));
println!("{:?}", [Link]().as_secs());
}
It is easier to work with environment variables from a .env file (instead of setting
them in the console), so let's add a popular crate for this purpose, called dotenv, in
[Link]:
[dependencies]
dotenv = "0.15.0"
Depending on when you are reading this book, you may have a later version of this
tool available, which you may choose to use.
2. In [Link], add the following code:
use dotenv::dotenv;
use std::env;
fn main() {
dotenv().ok();
Project scope and design overview 117
In the preceding code, we import the std::env module and also the
dotenv::dotenv module.
The following statement loads the environment variables from an .env file:
dotenv().ok();
The for loop in the previous code block iterates through the environment variables
in a loop and prints them to the console. env:vars() returns an iterator of
key-value pairs for all environment variables of the current process.
3. To test this, let's create a [Link] file in the project root and make the following
entries:
size=small
mode=single
srcfolder=/home/bob/images/[Link]
4. Replace the srcfolder value with your own. Run the program with the
following command:
cargo run
You will see the environment variables from the .env file printed out, along with
the others associated with the process.
5. To access the value of any particular environment variable, the
std::env::var() function can be used, which takes the key of the variable as a
parameter. Add the following statement to the main() function and see the value
of the size variable printed out:
println!("Value of size is {}",
env::var("size").unwrap());
We have seen how to use environment variables to accept user inputs for image processing.
Let's see how to accept user inputs with command-line parameters.
118 Managing Environment, Command Line, and Time
The individual values for size, mode, and source_folder will be printed out as
shown here:
Of the two approaches we have seen – that is, using environment variables and command-
line parameters – the latter is more suitable for accepting inputs from end users, while the
environment variable approach is more suitable for developers configuring the tool.
However, for a user-friendly interface, the bare-bones functionality offered by
std::env::args is inadequate. We will use a third-party crate called StructOpt to
improve the user interaction with the CLI.
This concludes the deep dive into the Rust Standard Library modules for path manipulation,
time measurement, and reading environment and command-line parameters.
Here is a summary of the design approaches we have discussed, for the imagix library:
With this, we conclude this section on addressing project scope and design for the
imagix library. We are now ready to start writing the code for the image processing
library in the next section.
An overview of the overall code organization of the project is shown in Figure 4.5:
Let's first add the two external crates to [Link] in the imagecli project
folder root:
[dependencies]
image = "0.23.12"
structopt = "0.3.20"
In this section, we will walk through the code snippets for the following methods:
The rest of the code is standard Rust (not specific to the topics this chapter is focused on)
and can be found in the code repository for this chapter.
1. We first retrieve the directory entries in the source folder and collect them in
a vector.
2. We then iterate over entries in the vector and filter for only the image files. Note that
we are only focusing on PNG and JPG files in this project, but it can be extended to
other types of image files too.
3. A list of image files is returned from this method.
src/imagix/[Link]
.map_err(|e| ImagixError::UserInputError("Invalid
source folder".to_string()))?
.map(|res| [Link](|e| [Link]()))
.collect::<Result<Vec<_>, io::Error>>()?
.into_iter()
.filter(|r| {
[Link]() == Some("JPG".as_ref())
|| [Link]() == Some("jpg".as_ref())
|| [Link]() == Some("PNG".as_ref())
|| [Link]() == Some("png".as_ref())
})
.collect();
Ok(entries)
}
The code uses the read_dir() method to iterate through directory entries and collects
the results in a Vector. The Vector is then converted into an iterator, and the entries
are filtered to return only image files. This gives us the set of image files to work with, for
resizing. In the next subsection, we will review the code to perform the actual resizing of
the images.
Resizing images
In this subsection, we will review the code for resize_image(). This method performs
the resizing of images.
The logic for this method is as follows:
1. The method accepts a source image filename with the full source folder path,
resizes it as a .png file, and stores the resized file in a /tmp subfolder under the
source folder.
2. First, the source filename is extracted from the full path. The file extension is
changed to .png. This is because our tool will only support output files in .png
format. As an exercise, you can add support for other image format types.
3. Then the destination file path is constructed with the /tmp prefix, as the resized
image will need to be stored in the tmp subfolder under the source folder. To
achieve this, we first need to check whether the tmp folder already exists. If not, it
has to be created. The logic for constructing the path with the tmp subfolder and for
creating the tmp subfolder is shown in the previous code listing.
124 Managing Environment, Command Line, and Time
4. Finally, we need to resize the image. For this, the source file is opened, the resize
function is called with requisite parameters, and the resized image is written to the
output file.
5. The time taken for image resizing is calculated using the Instant::now() and
Elapsed::from() functions.
The code listing is shown here. For purposes of explanation, the code listing has been split
into multiple snippets.
The code listed here accepts three input parameters – the size, source folder, and an entry
of type PathBuf (which can refer to the full path of an image file). The file extension is
changed to .png as this is the output format supported by the tool:
The code snippet here appends the suffix /tmp to the file path entry in order to create the
destination folder path. Note that due to a limitation in the standard library, the filename
is first constructed as [Link], which is subsequently changed to reflect the final resized
image filename:
The code here opens the image file and loads the image data into memory. The /tmp
subfolder is created under the source folder. Then, the image is resized and written to the
output file in the destination folder. The time taken for the resizing operation is recorded
and printed out:
We have now seen the code for resizing images. Next, we will look at the code for
generating image stats.
Image statistics
In the previous subsection, we looked at the code for image resizing. In this subsection,
we will see the logic for generating image statistics. This method will count the number
of image files in a specified source folder, and measure their total file size.
The logic of the get_stats() method that we will use is described as follows:
1. The get_stats() method takes a source folder as its input parameter and returns
two values: the number of image files in the folder, and the total aggregate size of all
image files in the folder.
2. Get a list of image files in the source folder by calling the get_image_files()
method.
126 Managing Environment, Command Line, and Time
src/imagix/[Link]
We have covered the code for the image processing functionality. We will now cover some
details of our custom error handling for the project.
Error handling
Let's now take a look at our error handling design.
As a part of our project, there may be many failure conditions that we have to handle.
Some of them are given here:
Let's define a custom error type to handle all these different types of errors in a unified
manner, and provide the error as output to the users of our library:
src/imagix/[Link]
The names of the errors are mostly self-explanatory. FormatError is any error
encountered while converting or printing values of parameters. The goal of defining this
custom error type is that the various types of errors that may be encountered during
processing, such as errors in user input, the inability to read through a directory or write
to a file, an error in image processing, and so on, are converted into our custom error type.
It is not enough to just define a custom error type. We also have to ensure that when errors
happen in due course of the program's operation, these errors are translated into the
custom error type. For example, an error in reading an image file raises an error defined
in the std::fs module. This error should be caught and transformed into our custom
error type. This way, regardless of whether there is an error in file operations or error
processing, the program uniformly propagates the same custom error type for handling by
the frontend interface to the user (in the case of this project, it is the command line).
For the conversion of various types of errors into ImagixError, we will implement the
From trait. We will also implement the Display trait for our error type so that the errors
can be printed out to the console.
Within each of the methods in the project modules, at the failure points, you will notice
that ImagixError is raised and propagated back to the calling function. The source
code can be found in the source folder for this chapter in the Packt code repository.
This concludes the error handling subsection of the code.
This also concludes this section on coding the imagix library. We have only walked
through key code snippets as it isn't practical to print out the entire code listing inline in
the chapter. I would urge the reader to go through the entire source code to understand
how the various features are translated into idiomatic Rust code.
128 Managing Environment, Command Line, and Time
In the next section, we will build the command-line application that wraps this library and
provides the user interface.
src/imagix/[Link]
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_single_image_resize() {
let mut path = PathBuf::from("/tmp/images/
[Link]");
let destination_path = PathBuf::from(
"/tmp/images/tmp/[Link]");
match process_resize_request(SizeOption::Small,
Mode::Single, &mut path) {
assert_eq!(true, destination_path.exists());
}
#[test]
fn test_multiple_image_resize() {
let mut path = PathBuf::from("/tmp/images/");
let _res = process_resize_request(
SizeOption::Small, Mode::All, &mut path);
let destination_path1 = PathBuf::from(
"/tmp/images/tmp/[Link]");
let destination_path2 = PathBuf::from(
"/tmp/images/tmp/[Link]");
assert_eq!(true, destination_path1.exists());
assert_eq!(true, destination_path2.exists());
}
}
Place the [Link] and [Link] files in /tmp/images and execute the tests
with the following command:
cargo test
You can see the tests pass successfully. You can also inspect the resized images.
As an exercise, you can add the test cases for the image stats function as well.
We can now conclude that the imagix library works as intended. Let's now move on to
designing the command-line application.
We shall first look at the CLI requirements.
• For resizing images, the command is cargo run –-release resize with
three parameters.
• For image statistics, the command is cargo run –-release stats with
one parameter.
• For resizing a single image the command is cargo run --release resize
--size small --mode single --srcfolder <path-to-image-
file/[Link]>.
• For resizing multiple images, we use the cargo run --release resize
--size medium --mode all --srcfolder <path-to-folder-
containing-images> command.
• For image statistics, the cargo run --release stats --srcfolder
<path-to-folder-containing-images> command is used.
The imagecli main() function parses the command-line parameters, handles user and
processing errors with suitable messages to the user, and invokes the respective functions
from the imagix library.
Developing the command-line application and testing 131
Let's do a quick recap. To resize images, we need to know the following from the user:
In this section, we designed the CLI for the tool. In the previous sections, we built the
imagix library to resize images. We will now move on to the last part of the project,
which is to develop the main command-line binary application that ties all the pieces
together and accepts user inputs from the command-line.
1. We will start with the imports section. Note the imports of the imagix library that
we have written, and structOpt for command-line argument parsing:
mod imagix;
use ::imagix::error::ImagixError;
use ::imagix::resize::{process_resize_request, Mode,
SizeOption};
use ::imagix::stats::get_stats;
use std::path::PathBuf;
use std::str::FromStr;
use structopt::StructOpt;
// Define commandline arguments in a struct
132 Managing Environment, Command Line, and Time
2. We will now see the definition of the command-line parameters for the tool.
For this we will use the structopt syntax. Refer to documentation at
[Link] Basically, we have defined an enum called
Commandline and defined two subcommands, Resize and Stats. Resize
takes three arguments: size, mode and srcfolder (the source folder). Stats
takes one argument: srcfolder:
#[derive(StructOpt, Debug)]
#[structopt(
name = "resize",
about = "This is a tool for image resizing and
stats",
help = "Specify subcommand resize or stats. For
help, type imagecli resize --help or
imagecli stats --help"
)]
enum Commandline {
#[structopt(help = "
Specify size(small/medium/large),
mode(single/all) and srcfolder")]
Resize {
#[structopt(long)]
size: SizeOption,
#[structopt(long)]
mode: Mode,
#[structopt(long, parse(from_os_str))]
srcfolder: PathBuf,
},
#[structopt(help = "Specify srcfolder")]
Stats {
#[structopt(long, parse(from_os_str))]
srcfolder: PathBuf,
},
}
Developing the command-line application and testing 133
3. We can now review the code for the main() function. Here, we basically accept
the command-line inputs (validated by StructOpt) and invoke the suitable
methods from our imagix library. If the user specifies the Resize command, the
process_resize_request() method of the imagix library is invoked. If the
user specifies the Stats command, the get_stats() method of the imagix
library is invoked. Any errors are handled with suitable messages:
fn main() {
let args: Commandline = Commandline::from_args();
match args {
Commandline::Resize {
size,
mode,
mut srcfolder,
} => {
match process_resize_request(size, mode,
&mut src_folder) {
Ok(_) => println!("Image resized
successfully"),
Err(e) => match e {
ImagixError::FileIOError(e) =>
println!("{}", e),
ImagixError::UserInputError(e) =>
println!("{}", e),
ImagixError::ImageResizingError(e)
=> println!("{}", e),
_ => println!("Error in
processing"),
},
};
}
Commandline::Stats { srcfolder } => match
get_stats(srcfolder) {
Ok((count, size)) => println!(
"Found {:?} image files with aggregate
134 Managing Environment, Command Line, and Time
The reason to use the release builds is that there is a considerable time difference in
resizing images between the debug and release builds (the latter being much faster).
You can then execute and test the following scenarios at the Terminal. Ensure to place one
or more .png or .jpg files in the folder that you specify in --srcfolder flag:
In this section, we have built a tool for image resizing that works from a CLI. As an
exercise, you can experiment by adding additional features, including adding support for
more image formats, changing the size of the output file, or even providing the option to
encrypt the generated image file for additional security.
Summary 135
Summary
In this chapter, we learned to write Rust programs that can discover and manipulate the
system environment, directory structures, and filesystem metadata in a cross-platform
manner, using the std::env, std::path, and std::fs modules. We looked at how
to create programs that can use command-line arguments or environment variables to
accept configuration parameters and user inputs. We saw the use of two third-party crates:
the StructOpt crate to improve the user interface of the tool, and image-rs/image
to do the image resizing.
We also learned how to use the std:time module to measure the time taken for specific
processing tasks. We defined a custom error type to unify error handling in the library. In
this chapter, we were also introduced to file handling operations.
In the next chapter, we will take a detailed look at doing advanced memory management
with the standard library.
Section 2:
Managing and
Controlling System
Resources in Rust
This section covers how to interact with the kernel in Rust for managing memory, files,
directories, permissions, terminal I/O, the process environment, process control and
relationships, handling signals, inter-process communications, and multithreading.
Example projects include a tool to compute Rust source file metrics, a text viewer,
a custom shell, and a multithreaded version of the Rust source file metrics tool.
This section comprises the following chapters:
We are now entering Section 2, Manage and Control System Resources in Rust, of the book.
Figure 5.1 provides the context for this section:
We will begin the chapter with an overview (or a refresher for those already familiar
with the topic) of the general principles of memory management in OSes, including the
memory management lifecycle and the layout of a process in memory. We will then cover
the memory layout of a running Rust program. This will cover how a Rust program is
laid out in memory and the characteristics of the heap, stack, and static data segments. In
the third section, we learn about the Rust memory management lifecycle, how it differs
from other programming languages, and how memory is allocated, manipulated, and
released in Rust programs. Lastly, we will enhance the template engine that we started to
build in Chapter 3, Introduction to the Rust Standard Library and Key Crates for Systems
Programming, with a dynamic data structure.
Technical requirements 141
Technical requirements
Rustup and Cargo must be installed in a local development environment.
The complete code for this chapter can be found at [Link]
PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter05.
We have learned about the memory management lifecycle of a process. Let's now
understand how a program is laid out in memory by the operating system.
The portion of Figure 5.3 marked A shows that the overall virtual memory space allocated
to a process is split into Kernel space and User space. Kernel space is the area of memory
where the portion of the kernel is loaded that assists the program in managing and
communicating with hardware resources. This includes kernel code, the kernel's own
memory area, and space marked Reserved. In this chapter, we will focus only on the
User space, as that is the area that is actually used by the program. The kernel space of
virtual memory is not accessible to the program.
The user space is segregated into several memory segments, which are described here:
• Text segment contains the program's code and other read-only data such as string
literals and const parameters. This portion is directly loaded from the program
binary (executable or library).
• Data segment stores global and static variables that are initialized with
non-zero values.
• BSS segment contains uninitialized variables.
• Heap is used for dynamic memory allocation. The address space of the process
continues to grow as memory gets allocated on the heap. The heap grows upward,
which means new items are added at addresses greater than previous items.
• Stack is used for local variables, and also function parameters (in some platform
architectures). Stacks grow downwards, which means that items put earlier in the
stack occupy lower address spaces.
Tip
Note that the stack and the heap are allocated at opposite ends of the process
address space. As the stack size increases, it grows downwards, and as the heap
size increases, it grows upwards. In the event that they meet, a stack overflow
error occurs or a memory allocation call on the heap will fail.
• In between the stack and the heap, there is also the area where any shared memory
(memory shared across processes), shared libraries used by the program, or
memory-mapped areas (areas of memory that reflect a file on a disk) are located.
• Above the stack, there is a segment where command-line arguments passed to the
program and the environment variables set for the process are stored.
Memory management is a complex topic and a lot of details have been left out in the
interest of keeping the discussion focused on memory management in Rust. However, the
basics of virtual memory management and virtual memory addresses described earlier are
critical for understanding the next section on how Rust performs memory management.
Understanding the memory layout of Rust programs 147
Let's walk through this figure to understand the memory layout of a Rust program:
• Rust process: When a Rust executable binary (for example, created using cargo
build) is read into system memory by the kernel and executed, it becomes a
process. The operating system assigns each process its own private user space so
that different Rust processes don't interfere with each other accidentally.
• Text segment: Executable instructions of the Rust program are placed here. This is
placed below the stack and heap to prevent any overflows from overwriting it. This
segment is read-only so that its contents are not accidentally overwritten. However,
multiple processes can share the text segment. Let's take the example of a text editor
written in Rust running in process 1. If a second copy of the editor is to be executed,
then the system will create a new process with its own private memory space (let's
call it process 2), but will not reload the program instruction of the editor. Instead,
it will create a reference to the text instructions of process 1. But the rest of the
memory (the data, stack, and so on) is not shared across processes.
• Data segment: The data segment can be divided into initialized variables (such
as variables declared as static), uninitialized variables (also known as bss or block
started by symbol), and the heap. During execution, if the program asks for more
memory, it is allocated in the heap area. The heap is thus associated with dynamic
memory allocation.
• Stack segment: The stack is the region of the process memory that stores
temporary (local) variables, function parameters, and the return address of the
instruction (which is to be executed after the function call is over). By default,
all memory allocations in Rust are on the stack. Whenever a function is called,
its variables get memory-allocated on the stack. Memory allocation happens in
contiguous memory locations one above the other, in a stack data structure.
• The code instructions of a Rust program go into the text segment area.
• The primitive data types are allocated on the stack.
• The static variables are located in the data segment.
• The heap-allocated values (values whose size is not known at compilation time,
such as vectors and strings) are stored in the heap area of the data segment.
• The uninitialized variables are in the BSS segment.
Of these, the Rust programmer does not have much control over the text segment and BSS
segments, and only primarily works with the stack, heap, and static areas of memory. In
the next section, we will delve into the characteristics of these three memory areas.
Table 5.1 – Characteristics of the stack, heap, and static memory areas
In this section, we have covered the memory layout of Rust programs and understood the
characteristics of the stack and data segment memory areas. In the next section, we will
provide an overview of the Rust memory management lifecycle and a comparison with
other programming languages. We will also look at the three steps of the Rust memory
management lifecycle in detail.
152 Memory Management in Rust
1. Memory allocation
2. Memory use and manipulation
3. Memory release (deallocation)
The way these three steps are performed varies across programming languages.
High-level languages (such as Java, JavaScript, and Python) hide a lot of the details of
memory management from the programmer (who has limited control), automate memory
deallocation using a garbage collector component, and do not provide direct access to
memory pointers to the programmer.
Low-level (also known as system) programming languages such as C/C++ provide a
complete degree of control to the programmer but do not provide any safety nets. Managing
memory efficiently is left solely to the skills and meticulousness of the developer.
154 Memory Management in Rust
Rust combines the best of both worlds. A Rust programmer has full control over memory
allocation, being able to manipulate and move around values and references in memory,
but is subjected to strict Rust ownership rules. Memory deallocation is automated by the
compiler-generated code.
We have seen an overview of the memory management approaches of Rust versus other
programming languages. Let's now see them in more detail in the following subsections.
Memory allocation
Memory allocation is the process of storing a value (it can be an integer, string, vector, or
higher-level data structures such as network ports, parsers, or e-commerce orders) to a
location in memory. As part of memory allocation, a programmer instantiates a data type
(primitive or user-defined) and assigns an initial value to it. The Rust program invokes
system calls to allocate memory.
In higher-level languages, the programmer declares variables using the specified syntax.
The language compiler (in conjunction with the language runtime) handles the allocation
and exact location of the various data types in virtual memory.
In C/C++, the programmer controls memory allocation (and reallocation) through the
system call interfaces provided. The language (compiler, runtime) does not intervene in
the programmer's decision.
The Rust memory management lifecycle 155
In Rust, by default, when the programmer initializes a data type and assigns it a value,
the operating system allocates memory on the stack. This applies to all primitive types
(integers, floating points, char, Boolean, fixed-length arrays), function local variables,
function parameters, and other fixed-length data types (such as smart pointers). But the
programmer has the option to explicitly place a primitive data type on the heap by using
Box<T> smart pointers. Secondly, all dynamic values (for example, strings and vectors
whose size changes at runtime) are stored on the heap, and the smart pointer to this heap
data is placed on the stack. To summarize, for fixed-length variables, values are stored on
the stack, variables with a dynamic length are allocated memory on the heap segment, and
a pointer to the starting location of heap-allocated memory is stored on the stack.
Let's now look at some additional information about memory allocation.
All data types declared in a Rust program have their size calculated at compile time; they
are not dynamically allocated or freed. So what, then, is dynamic?
When there are values that change over time (for example, a String whose value is
not known at compile time or a collection where the number of elements is not known
upfront), these are allocated at runtime on the heap, but a reference to such data is stored
as a pointer (which has a fixed size) on the stack.
For example, run the following code:
use std::mem;
fn main() {
println!("Size of string is {:?}",
mem::size_of::<String>());
}
When you run this program on a 64-bit system, the size of String will be printed as
24, meaning it takes 24 bytes. Have you noticed that we are printing the size of String
without even creating a string variable or assigning a value to it? This is because Rust
does not care how long a string is, in order to compute its size. Sound strange? This is
how it works.
156 Memory Management in Rust
In Rust, String is a smart pointer. This is illustrated in Figure 5.6. It has three
components: a pointer to bytes (stored in heap), a length, and capacity. Each of
these three components has a size of one machine-word each, so in the case of 64-bit
architectures, each of these 3 components of the String smart pointer occupies 64 bits
(or 8 bytes), hence the total size occupied by a variable of the String type is 24 bytes.
This is regardless of the actual value contained in the string, which is stored in the heap,
while the smart pointer (24 bytes) is stored on the stack. Note that even though the size
of the String smart pointer is fixed, the actual size of the memory allocated on the heap
may vary as the value of string changes during program runtime.
• First, all variables in Rust are immutable by default. If a value contained in a variable
needs to be altered, the variable has to be declared explicitly as mutable (with the
mut keyword).
• Secondly, there are ownership rules that apply to data access, which are listed in
a later subsection.
• Third, there are rules of references (borrowing) that apply when it comes to sharing
a value with one or more variables, which is also covered later.
• Fourth, there are lifetimes, which give information to the compiler about how two
or more references relate to each other. This helps the compiler prevent memory
safety issues by checking if the references are valid.
These concepts and rules make programming in Rust very different (and more difficult at
times) from other programming languages. But it is also these very concepts that impart
super-powers to Rust in areas of memory and thread-safety. Importantly, Rust provides
these benefits without runtime costs.
Let's now recap the Rust rules for ownership and for borrowing and references in the
subsections that follow.
The really interesting aspect of Rust is that these ownership rules are not meant for the
programmer to memorize, but the Rust compiler enforces these rules. Another significant
implication of these ownership rules is that the same rules also ensure thready safety, in
addition to memory safety.
In this subsection, we have covered several rules governing the manipulation of variables
and values in memory and the rules governing them. In the next subsection, we will
look at the last aspect of the memory management lifecycle, which is about deallocating
memory after use.
Memory deallocation
Memory deallocation deals with the question of how to release memory back to the
operating system from the Rust program. Stack-allocated values are automatically
released, as this is a managed-memory area. Static variables have a lifetime until the end
of the program, so they get released automatically when the program terminates. The real
question around memory release applies to heap-allocated memory.
160 Memory Management in Rust
Some of these values may not be required to be held in memory until the end of the
program, in which case they can be released. But the mechanism of such memory release
varies widely across different programming language groups:
We have so far seen the rules governing memory allocation, manipulation, and release
in Rust programs. All these collectively aim to achieve the primary goal of memory
safety without an external garbage collector, which is truly one of the highlights of the
Rust programming language. The following callout section describes the various types of
memory vulnerabilities and how Rust prevents them.
The Rust memory management lifecycle 161
• Double-free: Attempting to release the same memory location(s) more than once.
This can result in undefined behavior or memory corruption. Rust ownership rules
allow the release of memory only by the owner of a value, and at any point, there
can be only one owner of a value allocated in the heap. Rust thus prevents this class
of memory safety bugs.
• Use-after-free: A memory location is accessed after it has been released by the
program. The memory being accessed may have been allocated to another pointer,
so the original pointer to this memory may inadvertently corrupt the value at the
memory location causing undefined behavior or security issues through arbitrary
code execution. Rust reference and lifetime rules enforced by the borrow checker in
the compiler always ensure that a reference is valid before use. Rust borrow checker
prevents a situation where a reference outlives the value it points to.
• Buffer overflow: The program attempts to store a value in memory beyond the
allocated range. This can corrupt data, cause a program to crash, or result in the
execution of malicious code. Rust associates capacity with a buffer and performs
bounds check on access. So, in safe Rust code, it is not possible to overflow a buffer.
Rust will panic if you attempt to write out of bounds.
• Uninitialized memory use: The program reads data from a buffer that was
allocated but not initialized with values. This causes undefined behavior because
the memory location can hold indeterminate values. Rust prevents reading from
uninitialized memory.
• Null pointer dereference: The program writes to memory with a null pointer,
causing segmentation faults. A null pointer is not possible in safe Rust because Rust
ensures that a reference does not outlive the value it refers to, and Rust's lifetime
rules require functions manipulating references to declare how the references from
input and output are linked, using lifetime annotations.
We have thus seen how Rust achieves memory safety through its unique system of
immutable-by-default variables, ownership rules, lifetimes, reference rules, and
borrow-checker.
With this, we conclude this section on the Rust memory management lifecycle. In the next
section, we will implement a dynamic data structure in Rust.
162 Memory Management in Rust
Figure 5.7 – Conceptual model of the template engine (from Chapter 3, Introduction to the Rust
Standard Library and Key Crates for Systems Programming)
You will recall that we implemented a template engine in Chapter 3, Introduction to
the Rust Standard Library and Key Crates for Systems Programming, to parse an input
statement with a template variable and convert it into a dynamic HTML statement using
context data provided. We will enhance the template variable feature in this section. We
will first discuss the design changes and then implement the code changes.
Implementing a dynamic data structure 163
You will notice that there are two template variables in the input statement
here—name and city. We will have to enhance our design to support this, starting
with the ExpressionData struct, which stores the result of the parsing of the
template-variable statement.
Let's look at the data structure ExpressionData. We can start with the code from
Chapter03 located at [Link]
System-Programming-for-Rust-Developers/tree/master/Chapter03:
#[derive(PartialEq, Debug)]
pub struct ExpressionData {
pub head: Option<String>,
pub variable: String,
pub tail: Option<String>,
}
164 Memory Management in Rust
In our implementation, the input value of <p> Hello {{name}}. How are you?
</p> will be tokenized into the ExpressionData struct as follows:
Head = Hello
Variable = name
Tail = How are you?
The string literal before the template variable was mapped to the Head field in
ExpressionData, and the string literal after the template variable was mapped
to the Tail field of ExpressionData.
As you can see, we have made provision for only one template variable in the data
structure (the variable field is of type String). In order to accommodate multiple
template variable in a statement, we must alter the struct, to allow the variable
field to store more than one template variable entry.
In addition to allowing multiple template variables, we also need to accommodate a more
flexible structure of input statements. In our current implementation, we accommodate
one string literal before template variable, and one literal after it. But in the
real world, an input statement can have any number of string literals, as shown in the
following example:
<p> Hello , Hello {{name}}. Can you tell me if you are living
in {{city}}? For how long? </p>
• Allow for the parsing of more than one template variable per statement
• Allow for the parsing of more than two string literals in the input statement
To allow for these changes, we have to redesign the ExpressionData struct. We also
need to modify the methods that deal with ExpressionData to implement the parsing
functionality for these two changes.
Implementing a dynamic data structure 165
Let's review the summary of changes to be made to the design, which is shown in Figure
5.8. This figure is from Chapter 3, Introduction to the Rust Standard Library and Key Crates
for Systems Programming, but the components to be changed are highlighted in the figure:
src/[Link]
We have fully revamped the structure of ExpressionData. It now has three fields. The
descriptions of the fields are provided here:
Due to this change to the structure of ExpressionData, we have to alter the following
two functions: get_expression_data() and generate_html_template_var():
src/[Link]
src/[Link]
pub fn generate_html_template_var(
content: &mut ExpressionData,
context: HashMap<String, String>,
) -> &mut ExpressionData {
content.gen_html = [Link]();
for var in &content.var_map {
let (_h, i) = get_index_for_symbol(&var, '{');
let (_j, k) = get_index_for_symbol(&var, '}');
let var_without_braces = &var[i + 2..k];
let val = [Link](var_without_braces).unwrap();
content.gen_html = content.gen_html.replace(var, val);
}
content
}
Implementing a dynamic data structure 169
This function accepts two inputs—the ExpressionData type and the context
HashMap. Let's understand the logic through an example. Let's also assume the following
input values are passed to the function:
1. We iterate through the list of template variables contained in the var_map field
of content.
2. For each iteration, we first strip out the leading and trailing curly braces from the
template variable values stored in the var_map field of content. So {{name}}
becomes name and {{city}} becomes city. We then look them up in the
context HashMap and retrieve the value (yielding Bob and Boston).
3. The last step is to replace all instances of {{name}} in the input string with Bob
and all instances of {{city}} with Boston. The resultant string is stored in the
gen_html field of the content struct, which is of type ExpressionData.
And finally, we will modify the main() function as follows. The main change in the
main() function, compared to Chapter 3, Introduction to the Rust Standard Library and
Key Crates for Systems Programming, is the change in the parameters to be passed to the
generate_hml_template_var() function:
src/[Link]
use std::collections::HashMap;
use std::io;
use std::io::BufRead;
use template_engine::*;
fn main() {
let mut context: HashMap<String, String> = HashMap::new();
[Link]("name".to_string(), "Bob".to_string());
[Link]("city".to_string(), "Boston".to_string());
170 Memory Management in Rust
With these changes, we can run the program with cargo run, and enter the following in
the command line:
You will see the following generated HTML statement displayed on your terminal:
• Allow for the parsing of more than one template variable per statement
• Allow for the parsing of more than two string literals in the input statement
Summary
In this chapter, we looked in depth at the memory layout of a standard process in the
Linux environment, and then the memory layout of a Rust program. We compared the
memory management lifecycle in different programming languages and how Rust takes
a different approach to memory management. We learned how memory is allocated,
manipulated, and released in a Rust program, and looked at the rules governing memory
management in Rust, including ownership and reference rules. We looked at the different
types of memory safety issues and how Rust prevents them from using its ownership
model, lifetimes, reference rules, and borrow checker.
We then returned to our template engine implementation example from Chapter03
and added a couple of features to the template engine. We achieved this by converting
a static data structure into a dynamic data structure and learned how memory is allocated
dynamically. Dynamic data structures are very useful in programs that deal with external
inputs, for example, in programs that accept incoming data from network sockets or
file descriptors, where it is not known in advance what the size of incoming data will
be, which is likely to be the case for most real-world complex programs that you will be
writing using Rust over the course of your professional career.
This concludes the memory management topic. In the next chapter, we will take a closer
look at the Rust Standard Library modules that deal with file and directory operations.
Further reading
Understanding Ownership in Rust: [Link]
[Link]
6
Working with Files
and Directories
in Rust
In the previous chapter, we looked at the details of how Rust uses memory, a key
system resource.
In this chapter, we will look at how Rust interacts with another important class of
system resources – files and directories. The Rust Standard Library offers a rich set of
abstractions that enable platform-independent file and directory operations.
For this chapter, we will review the basics of how files are managed by Unix/Linux, and
master the key APIs that the Rust Standard Library provides to deal with files, paths, links,
and directories.
Using the Rust Standard Library, we will implement a shell command, rstat, that counts
the total number of lines of Rust code in a directory (and its subfolders), and provides
a few additional source code metrics.
We will cover the topics in the following order:
Technical requirements
Verify that rustc, and cargo have been installed correctly with the following command:
rustc --version
cargo --version
The Git repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter06.
Another aspect that is unique to Linux/Unix is the philosophy that everything is a file.
Here, everything refers to system resources. There can be many types of files on
Linux/Unix:
In this chapter, we will focus on files, directories, and links. However, the universality of
the Unix I/O model means that the same set of system calls used to open, read, write, and
close regular files can also be used on any other types of files such as device files. This is
achieved in Linux/Unix by standardizing the system calls, which are then implemented by
various filesystems and device drivers.
Linux/Unix also provides a unified namespace for all its files and directories. Files and
directories organized into a hierarchy are called a filesystem. Many different filesystems
can be added to or removed from the namespace through mounting and unmounting.
For example, a CD-ROM drive can be mounted at /mnt/cdrom, which becomes the
location to access the root of the filesystem. The root directory of a "filesystem" can be
accessed at the mount point.
The mount namespace of a process is the set of all mounted filesystems it sees. A process
that makes system calls for file operations operates on the set of files and directories that it
sees as a part of its mount namespace.
The Unix/Linux system calls (Application Programming Interface - API) model for file
operations hinges on four operations: open, read, write, and close, all of which work with
the concept of file descriptors. What is a file descriptor?
A file descriptor is a handle to a file. Opening a file returns a file descriptor, and other
operations such as reading, writing, and closing use the file descriptor.
176 Working with Files and Directories in Rust
Let's now look at the common system calls associated with file operations, which the
operating system exposes:
• open(): This system call opens an existing file. It can also create a new file if the
file does not exist. It accepts a pathname, the mode in which the file is to be opened,
and flags. It returns a file descriptor that can be used in subsequent system calls to
access the file:
int open(const char *pathname, int flags, ... /* mode_t
mode */);
There are three basic modes in which to open a file – read only, write only, and read-
write. In addition, flags are specified as arguments to the open() system call. An
example of a flag is O_CREAT, which tells the system call to create a file if the file
does not exist, and returns the file descriptor.
If there is an error in opening a file, -1 is returned in place of the file descriptor,
and the error number (errno) returned specifies the reason for the error. File open
calls can fail for a variety of reasons including a permissions error and the incorrect
path being specified in an argument to a system call.
• read(): This system call accepts three arguments: a file descriptor, the number of
bytes to be read, and the memory address of the buffer into which the data read is
to be placed. It returns the number of bytes read. -1 is returned in the event of an
error when reading the file.
Doing file I/O in Rust 177
• write(): This system call is similar to read(), in that it also takes three
parameters – a file descriptor, a buffer pointer from which to read the data, and the
number of bytes to read from the buffer. Note that successful completion of the
write() system call does not guarantee that the bytes have been written to disk
immediately, as the kernel performs buffering of I/O to disk for performance and
efficiency reasons.
• close(): This system call accepts a file descriptor and releases it. If a close()
call is not explicitly invoked for a file, all open files are closed when the process
terminates. But it is good practice to release file descriptors (when no longer
needed) for reuse by the kernel.
• lseek(): For each open file, the kernel keeps track of a file offset, which represents
the location in the file at which the next read or write operation will happen. The
system call lseek() allows you to reposition the file offset to any location in the
file. The lseek() system call accepts three arguments – the file descriptor, an
offset, and a reference position. The reference position can take three values – start
of file, current cursor position, or end of file. The offset specifies the number of bytes
relative to the reference position that the file offset should be pointed to, for the next
read() or write().
This concludes the overview of terminologies and key concepts of how operating systems
manage files as system resources. We have seen the main system calls (syscalls) in
Linux for working with files. We will not be directly using these syscalls in this book.
But we will work with these syscalls indirectly, through the Rust Standard Library
modules. The Rust Standard Library provides higher-level wrappers to make it easier to
work with these syscalls. These wrappers also allow Rust programs to work without
necessarily understanding all the differences in syscalls across different operating
systems. However, gaining basic knowledge of how operating systems manage files gives
us a glimpse into what goes on under the hood when we use the Rust Standard Library for
file and directory operations.
In the next section, we will cover how to do file I/O in Rust.
The primary module in the Rust Standard Library for working with files is std::fs. The
official documentation for std::fs can be found here: [Link]
org/std/fs/[Link]. This documentation provides the set of methods, structs,
enums, and traits that collectively provide features for working with files. It helps to study
the structure of the std::fs module to gain a deeper understanding. However, for those
starting out with exploring system programming in Rust, it is more useful to begin with
a mental model of what kinds of things a programmer would like to do with files, and map
it back to the Rust Standard Library. This is what we will do in this section. The common
lifecycle operations for a file are shown in Figure 6.1.
The common things programmers like to do with files include creating a file, opening
and closing files, reading and writing files, accessing metadata about files, and setting file
permissions. These are shown in Figure 6.1. Descriptions of how to perform each of these
file operations using the Rust Standard Library are provided here:
• Create: The create operation simply creates a new file with the specified name,
at the specific location in the filesystem. The corresponding function call in the
std::fs module is File::create(), which allows you to create a new file and
write to it. Custom permissions for the file to be created can be specified using the
std::fs::OpenOptions struct. An example of a create operation using the
std::fs module is shown in the code snippet here:
use std::fs::File;
fn main() {
let file = File::create("./[Link]");
}
• Open: The open operation opens an existing file, given the full path to the file in the
filesystem. The function call to be used is std::fs::File::open(). This opens
a file in read-only mode by default. The std::fs::OpenOptions struct can
be used to set custom permissions to create the file. Two methods to open a file are
shown below. The first function returns a Result type, which we are just handling
using .expect(), which panics with a message if the file is not found. The second
function uses OpenOptions to set additional permissions on the file to be opened.
In the example shown, we are opening a file for the write operation, and also are
asking for the file to be created if not present already:
use std::fs::File;
use std::fs::OpenOptions;
fn main() {
// Method 1
let _file1 = File::open("[Link]").expect("File
not found");
// Method 2
let _file2 = OpenOptions::new()
.write(true)
.create(true)
.open("[Link]");
}
180 Working with Files and Directories in Rust
• Copy: This is simply a byte-by-byte copy of the contents of one file to another.
The std::fs::copy() function can be used to copy the contents of one file to
another, overwriting the latter. An example is shown here:
use std::fs;
fn main() {
fs::copy("[Link]", "[Link]").expect("Unable
to copy");
}
• Rename: This is an operation that renames a specified file to a new name. Errors
can occur if the from file does not exist, or if permissions are insufficient. In Rust,
the std::fs::rename() function can be used for this purpose. If the to file
exists, it is replaced. One thing to note is that there can be more than one filesystem
mounted (at various points) within the mount namespace of a process, as seen in
the previous section. The rename method in Rust will work only if both the from
and to file paths are in the same filesystem. An example of usage of the rename()
function is shown here:
use std::fs;
fn main() {
fs::rename("[Link]", "[Link]").expect("Unable
to rename");
}
• Read: The read operation takes a filename with its path and reads the contents.
In the std::fs module, there are two functions available: fs::read() and
fs::read_to_string(). The former reads the contents of a file into a bytes
vector. It pre-allocates a buffer based on file size (when available). The latter reads
the contents of a file directly into a string. Examples are shown here:
use std::fs;
fn main() {
let byte_arr = fs::read("[Link]").expect("Unable
to read file into bytes");
println!(
"Value read from file into bytes is {}",
String::from_utf8(byte_arr).unwrap()
);
let string1 = fs::read_to_string("[Link]").
Doing file I/O in Rust 181
In the code snippet shown for fs::read(), we convert the byte_arr into
a string for printing purposes, as printing out a byte array is not human-readable.
• Write: The write operation writes the contents of a buffer into a file. In std::fs,
the fs::write() function accepts a filename and a byte slice, and writes the byte
slice as the contents of the file. An example is shown here:
use std::fs;
fn main() {
fs::write("[Link]", "Rust is exciting,isn't
it?").expect("Unable to write to file");
}
• Query: These operations deal with obtaining metadata about files. There are
several query methods available on files in the std::fs module. The functions
is_dir(), is_file(), and is_symlink() respectively check whether
a file is a regular file, directory, or a symlink. The modified(), created(),
accessed() , len(), and metadata() functions are used to retrieve file
metadata information. The permissions() function is used to retrieve a list of
permissions on the file.
A few examples of the usage of query operations are shown here:
use std::fs;
fn main() {
let file_metadata = fs::metadata("[Link]").
expect("Unable to get file metadata");
println!(
"Len: {}, last accessed: {:?}, modified : {:?},
created: {:?}",
file_metadata.len(),
file_metadata.accessed(),
file_metadata.modified(),
file_metadata.created()
);
println!(
182 Working with Files and Directories in Rust
• Metadata: Metadata for a file includes details about a file such as file type, file
permissions, last accessed time, created time, and so on. Permissions for a file can
be set for a file using set_permissions(). An example is shown here, where,
after setting the file permission to read-only, the write operation to the file fails:
use std::fs;
fn main() {
let mut permissions = fs::metadata("[Link]").
unwrap().permissions();
permissions.set_readonly(true);
let _ = fs::set_permissions("[Link]",
permissions).expect("Unable to set permission");
• Close: In Rust, files are automatically closed when they go out of scope. There is no
specific close() method in the Rust Standard Library to close files.
In this section, we saw the key function calls from the Rust Standard Library that can be
used to perform file manipulation and query operations. In the next section, we will take
a look at how the Rust Standard Library can be used for directory and path operations.
Learning directory and path operations 183
In the Rust Standard Library, the std::fs module contains methods to work with
directories, and the std::path module contains methods to work with paths.
Just as in the previous section, we will look at the common programming tasks involving
directory and path manipulations. These are shown in Figure 6.2 and detailed here:
1. Read details of directory entries: In order to write system programs that deal with
files and directories, it is necessary to understand how to read through the structure
of a directory, retrieve the directory entries, and get their metadata. This is achieved
by using functions in the std::fs module. The std::fs::read_dir()
function can be used to iterate through and retrieve the entries in a directory. From
the directory entry thus retrieved, the metadata details of the directory entry can be
obtained with the functions path(), metadata(), file_name(), and file_
type(). Examples of how to do this are shown here:
use std::fs;
use std::path::Path;
fn main() {
let dir_entries = fs::read_dir(".").expect("Unable to
read directory contents");
// Read directory contents
for entry in dir_entries {
//Get details of each directory entry
let entry = [Link]();
let entry_path = [Link]();
let entry_metadata = [Link]().unwrap();
let entry_file_type = entry.file_type().unwrap();
let entry_file_name = entry.file_name();
println!(
"Path is {:?}.\n Metadata is {:?}\n File_type
is {:?}.\n Entry name is{:?}.\n",
entry_path, entry_metadata, entry_file_type,
entry_file_name
);
}
Learning directory and path operations 185
Note that there are two other functions also available to create directories.
create_dir() and create_dir_all() in std::fs can be used for
this purpose.
Likewise, the functions remove_dir() and remove_dir_all() in the
std::fs module can be used to delete directories.
Next, we'll look at how to construct path strings dynamically.
186 Working with Files and Directories in Rust
In the code shown, a new variable of type PathBuf is constructed, and the various
path components are dynamically added to create a fully qualified path.
This concludes this subsection on directory and path operations with the Rust
Standard Library.
In this section, we looked at how to use the Rust Standard Library to read through
directory entries, get their metadata, construct a directory structure programmatically,
get path components, and build a path string dynamically.
In the next section, we will look at how to work with links and queries.
Setting hard links, symbolic links, and performing queries 187
• Create a hard link: The Rust std::fs module has a function, fs::hard_link,
that can be used to create a new hard link on the file system. An example is
shown here:
use std::fs;
fn main() -> std::io::Result<()> {
fs::hard_link("[Link]", "./[Link]")?; // Hard
188 Working with Files and Directories in Rust
The fs::read_link function can be used to read a symbolic link as shown in the code.
With this, we conclude the subsection on working with links in the Rust Standard Library.
We have so far seen how to work with files, directories, paths, and links in Rust. In the
next section, we will build a small shell command that demonstrates the practical use of
the Rust Standard Library for file and directory operations.
Here is an example of the result you will see from this shell command:
This section is structured as four sub-sections. In the first sub-section, we will see an
overview of the code structure and a summary of steps to build this shell command.
Then, in three different subsections, we will review the code for the three source files
corresponding to error handling, source metric computation, and the main program.
Code overview
In this subsection, we will look at how the code is structured for the shell command. We
will also review a summary of the steps to build the shell command. Let's get started.
The code structure is shown in Figure 6.4:
Here is a summary of the steps to build the shell command. The source code snippets are
shown later in this section:
1. Create project: Create a new project with the following command and change
directory into the rstat directory:
cargo new rstat && cd rstat
2. Create source files: Create three files under the src folder – [Link],
[Link], and [Link].
3. Define custom error handling: In [Link], create a struct, StatsError,
to represent our custom error type. This will be used to unify error handling
in our project and to send messages back to the user. Implement the following
four traits on struct StatsError : fmt::Display, From<&str>,
From<io::Error>, and From<std::num::TryFromIntError>.
4. Define logic for computing source stats: In [Link], create a struct,
SrcStats, to define the source metrics to be computed. Define two functions:
get_src_stats_for_file() (which accepts a filename as an argument and
computes the source metrics for that file) and get_summary_src_stats()
(which takes a directory name as an argument and computes source metrics for all
files in that directory root).
5. Write the main() function to accept command-line parameters:
In [Link], define a Opt struct to define command-line parameters and flags for
the shell command. Write the main() function, which accepts a source directory
name from the command line and invokes the get_summary_src_stats()
method in the srcstats module. Ensure to include structopt in [Link]
under dependencies.
6. Build the tool with the following command:
cargo build --release
Alternatively, add the rstat binary to the path, and set LD_LIBRARY PATH to
run the shell command like this:
target/debug/rstat -m src <src-folder>
Writing a shell command in Rust (project) 191
8. View the consolidated source stats printed to the terminal and confirm the
metrics generated.
Let's now look at the code snippets for the steps listed previously. We will start by defining
custom error handling.
Error handling
While executing our shell command, several things can go wrong. The source folder
specified may be invalid. The permissions may be insufficient to view the directory entries.
There can be other types of I/O errors such as those listed here: [Link]
[Link]/std/io/[Link]. In order for us to give a meaningful
message back to the user, we will create a custom error type. We will also write conversion
methods that will automatically convert different types of I/O errors into our custom error
type by implementing various From traits. All this code is stored in the [Link] file.
Let's review the code snippets from this file in two parts:
• Part 1 covers the definition of the custom error type and Display trait
implementation.
• Part 2 covers the various From trait implementations for our custom error type.
src/[Link] (part-1)
use std::fmt;
use std::io;
#[derive(Debug)]
pub struct StatsError {
pub message: String,
}
fmt::Error> {
write!(f, "{}", self)
}
}
Here the StatsError struct is defined with a field message that will be used to store
the error message, which will get propagated to the user in case of errors. We have also
implemented the Display trait to enable the error message to get printed to the console.
Let's now see part 2 of the [Link] file. Here, we implement the various From trait
implementations, as shown here. Code annotations are numbered, and are described after
the code listing:
src/[Link] (part-2)
The source code annotations (shown with numbers) are detailed here:
In this section, we reviewed the code for the [Link] file. In the next section, we will
see the code for the computation of source code metrics.
Let's look at part 1. The module imports are shown here. The descriptions corresponding
to code annotation numbers are shown after the code listing:
src/[Link] (part-1)
The descriptions for the numbered code annotations are listed here:
We will now look at part 2. The definition of the struct to store computed metrics is
covered here.
The struct SrcStats contains the following source metrics, which will be generated by
our shell command:
The Rust data structure to hold the computed source file metrics is shown next:
src/[Link] (part-2)
Let's look at part 3, which is the main function that computes summary statistics. As this
code is a bit long, we will look at this in three parts:
• In part 3c, we iterate through the list of Rust files and invoke the get_src_
stats_for_file() method to compute source metrics for each file. The results
are consolidated.
Part 3a shows the initialization of variables representing the various metrics that
will be computed by the shell command – total_loc, total_comments, and
total_blanks. Two more variables, dir_entries and file_entries, are
initialized as vector data types, which will be used for intermediate computations.
Part 3b of the get_summary_src_stats() method is shown here:
src/[Link] (part-3b)
}
}
}
}
}
In part 3b of the code, we are iterating through the entries within the specified folder and
segregating the entries of the type directory from the entries of the type file, and storing
them in separate vector variables.
Part 3c of the get_summary_src_stats() method is shown here:
Ok(SrcStats {
number_of_files: u32::try_from(file_count)?,
loc: total_loc,
comments: total_comments,
blanks: total_blanks,
})
}
Writing a shell command in Rust (project) 197
We will now look at part 4, which is the code to compute source metrics for an individual
Rust source file:
src/[Link] (part-4)
This concludes the code listing for the srcstats module. In this subsection, we
reviewed the code for computing source code metrics. In the next section, we will
review the code for the last part of the code listing, which is the main() function.
The code listing for the main() function is shown in two parts:
• Part 1 shows the structure of the command-line interface for the shell command.
• Part 2 shows the code to invoke calls for the computation of source metrics and to
display the results to the user.
Part 1 of [Link] is shown here. We will use the structopt crate to define the
structure of the command line inputs to be accepted from the user.
Add the following to the [Link] file:
[dependencies]
structopt = "0.3.16"
src/[Link] (part-1)
use std::path::PathBuf;
use structopt::StructOpt;
mod srcstats;
use srcstats::get_summary_src_stats;
mod errors;
use errors::StatsError;
Writing a shell command in Rust (project) 199
#[derive(Debug, StructOpt)]
#[structopt(
name = "rstat",
about = "This is a tool to generate statistics on Rust
projects"
)]
struct Opt {
#[structopt(name = "source directory",
parse(from_os_str))]
in_dir: PathBuf,
#[structopt(name = "mode", short)]
mode: String,
}
In part 1 of the code shown, a data structure, Opt, is defined, which contains two fields
– in_dir, representing the path to the input folder (for which source metrics are to be
computed), and a field, mode. The value for mode in our example is src, which indicates
that we want to compute source code metrics. In the future, additional modes can be
added (such as the object mode to compute object file metrics such as the size of the
executable and library object files).
In part 2 of this code, we read the source folder from user's command-line argument, and
invoke the get_summary_src_stats() method from the srcstats module, which
we reviewed in the previous subsection. The metrics returned by this method are then
shown to user in the terminal. Part 2 of the code listing is shown here:
src/[Link]
Part 2 shows the main() function, which is the entry point into our shell command.
The function accepts and parses command-line parameters, and invokes the
get_summary_src_stats() function, passing the source folder specified by the
user as a function parameter. The results, containing consolidated source code metrics,
are printed to the console.
Build and run the tool with the following commands:
<source-folder> is the location of the Rust project or source files and -m is the
command-line flag to be specified. It will be src, to indicate that we want source
code metrics.
If you want to run the stats for the current project, you can do so with the following:
Note the dot (.) in the command, which indicates we want to run the command for the
current project folder.
You will see the source code metrics displayed on the terminal.
As an exercise, you can extend this shell command to generate metrics on the binary
files generated for a Rust project. To invoke this option, allow the user to specify the
–m flag as bin.
This concludes the section on developing a shell command, which demonstrated file and
directory operations in Rust.
Summary 201
Summary
In this chapter, we reviewed the basics of file management at the operating system
level, and the main system calls to work with files. We then learned how to use the Rust
Standard Library to open and close a file, read and write to a file, query file metadata, and
work with links. After file operations, we learned how to do directory and path operations
in Rust. In the third section, we saw how to create hard links and soft (symbolic) links
using Rust, and how to query symlinks.
We then developed a shell command that computed source code metrics for Rust source
files within a directory tree. This project illustrated how to perform various file and
directory operations in Rust using a practical example, and reinforced the concepts of the
Rust Standard Library for file I/O operations.
Continuing with the topic of I/O, in the next chapter, we will learn the basics of terminal
I/O and the features Rust provides to work with pseudo terminals.
7
Implementing
Terminal I/O in Rust
In the previous chapter, we looked at how to work with files and directories. We also built
a shell command in Rust that generates consolidated source code metrics for Rust source
files in a project directory.
In this chapter, we will look at building terminal-based applications in Rust. Terminal
applications are an integral part of many software programs, including games, text
editors, and terminal emulators. For developing these types of programs, it helps to
understand how to build customized terminal interface-based applications. This is the
focus of this chapter.
For this chapter, we will review the basics of how terminals work, and then look at how
to perform various types of actions on a terminal, such as setting colors and styles,
performing cursor operations (such as clearing and positioning), and working with
keyboard and mouse inputs.
204 Implementing Terminal I/O in Rust
A bulk of this chapter will be dedicated to explaining these concepts through a practical
example. We will build a mini text viewer that will demonstrate key concepts of working
with terminals. The text viewer will be able to load a file from disk and display its contents
on the terminal interface. It will also allow a user to scroll through the contents using the
various arrow keys on the keyboard, and display information on the header and footer bar.
Technical requirements
The Git repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter07/tui.
For those working on the Windows platform, a virtual machine needs to be installed for
this chapter, as the third-party crate used for terminal management does not support
the Windows platform (at the time of writing this book). It is recommended to install a
virtual machine such as VirtualBox or equivalent running Linux for working with the
code in this chapter. Instructions to install VirtualBox can be found at [Link]
[Link].
For working with terminals, Rust provides several features to read keypresses and to
control standard input and standard output for a process. When a user types characters in
the command line, the bytes generated are available to the program when the user presses
the Enter key. This is useful for several types of programs. But for some types of programs,
such as games or text editors, which require more fine-grained control, the program must
process each character as it is typed by the user, which is also known as raw mode. There
are several third-party crates available that make raw mode processing easy. We will be
using one such crate, Termion, in this chapter.
Characteristics of terminals
Terminals are devices with which users can interact with a computer. Using a terminal,
a user can get command-line access to interact with the computer's operating system.
A shell typically acts as the controlling program to drive the terminal on one hand and
the interface with the operating system on the other hand.
Originally, UNIX systems were accessed using a terminal (also called a console)
connected to a serial line. These terminals typically had a 24 x 80 row x column
character-based interface, or in some cases, had rudimentary graphics capabilities. In
order to perform operations on the terminal, such as clearing the screen or moving the
cursor, specific escape sequences were used.
There are two modes in which terminals can operate:
• Canonical mode: In canonical mode, the inputs from the user are processed line
by line, and the user has to press the Enter key for the characters to be sent to the
program for processing.
• Noncanonical or raw mode: In raw mode, terminal input is not collected into lines,
but the program can read each character as it is typed by the user.
Terminals can be either physical devices or virtual devices. Most terminals today are
pseudo-terminals, which are virtual devices that are connected to a terminal device
on one side, and to a program that drives the terminal device on the other end.
Pseudo-terminals help us write programs where a user on one host machine can execute
a terminal-oriented program on another host machine using network communications.
An example of a pseudo-terminal application is SSH, which allows a user to log in to
a remote host over a network.
Terminal management includes the ability to perform the following things on a
terminal screen:
In this chapter, we will use a combination of the Rust standard library and the Termion
crate to develop a terminal-oriented application. Let's see an overview of the Termion
crate in the next section.
Let's discuss a few terminal management features of the Termion crate. The official
documentation of the crate can be found at [Link]
The Termion crate has the following key modules:
To include the Termion crate, start a new project and add the following entry to
[Link]:
[dependencies]
termion = "1.5.5"
Introducing terminal I/O fundamentals 207
A few examples of Termion usage are shown through code snippets here:
• To set the background color and then reset the background color to the original
state, use the following:
println!(
"{}Background{} ",
color::Bg(color::Cyan),
color::Bg(color::Reset)
);
We will use these terminal management features in a practical example in the upcoming
sections. Let's now define what we are going to build in this chapter.
Figure 7.1 shows the screen layout of what we will build in this chapter:
The text viewer will allow the user to perform the following actions:
A popular text viewer would have a lot more features, but this core scope provides
an adequate opportunity for us to learn about developing a terminal-oriented application
in Rust.
In this section, we've learned what terminals are and what kinds of features they support.
We also saw an overview of how to work with the Termion crate and defined what we will
be building as part of the project in this chapter. In the next section, we'll develop the first
iteration of the text viewer.
Let's start with data structures and the main() function of the text viewer
1. Create a new project and switch to the directory with the following command:
cargo new tui && cd tui
Here, tui stands for terminal user interface. Create a new file called text-
[Link] under src/bin.
210 Implementing Terminal I/O in Rust
3. Let's first import the required modules from the standard library and the
Termion crate:
use std::env::args;
use std::fs;
use std::io::{stdin, stdout, Write};
use termion::event::Key;
use termion::input::TermRead;
use termion::raw::IntoRawMode;
use termion::{color, style};
This code shows three data structures defined for the text viewer:
The document that will be displayed in the viewer is defined as a Doc struct, which
is a vector of strings.
Working with the terminal UI (size, color, styles) and cursors 211
To store cursor position x and y coordinates and to record the current size of the
terminal (the total number of rows and columns of characters), we have defined
a Coordinates struct.
The TextViewer struct is the main data structure representing the text viewer.
The number of lines contained in the file being viewed is captured in the
doc_length field. The name of the file to be shown in the viewer is recorded
in the file_name field.
5. Let's now define the main() function, which is the entry point for the text
viewer application:
fn main() {
//Get arguments from command line
let args: Vec<String> = args().collect();
if [Link]() < 2 {
println!("Please provide file name
as argument");
std::process::exit(0);
}
//Check if file exists. If not, print error
// message and exit process
if !std::path::Path::new(&args[1]).exists() {
println!("File does not exist");
std::process::exit(0);
}
// Open file & load into struct
println!("{}", termion::cursor::Show);
// Initialize viewer
let mut viewer = TextViewer::init(&args[1]);
viewer.show_document();
[Link]();
}
impl TextViewer {
fn init(file_name: &str) -> Self {
//...
}
fn show_document(&mut self) {
// ...
}
fn run(&mut self) {
// ...
}
}
So far, we've defined the data structures and written the main() function with
placeholders for the other functions. In the next section, let's write the function to
initialize the text viewer.
We've written the initialization code for the text viewer. Next, we'll write the code
to display the document contents on the terminal screen, and also display the header
and footer.
214 Implementing Terminal I/O in Rust
src/bin/[Link]
fn show_document(&mut self) {
let pos = &self.cur_pos;
let (old_x, old_y) = (pos.x, pos.y);
print!("{}{}", termion::clear::All,
termion::cursor::Goto(1, 1));
println!(
"{}{}Welcome to Super text viewer\r{}",
color::Bg(color::Black),
color::Fg(color::White),
style::Reset
);
for line in 0..self.doc_length {
println!("{}\r", [Link][line as usize]);
}
println!(
"{}",
termion::cursor::Goto(0, (self.terminal_size.y - 2) as
u16),
);
println!(
"{}{} line-count={} Filename: {}{}",
color::Fg(color::Red),
style::Bold,
self.doc_length,
Working with the terminal UI (size, color, styles) and cursors 215
self.file_name,
style::Reset
);
self.set_pos(old_x, old_y);
}
The code annotations for the show_document() method are described here:
1. Store the current positions of the cursor x and y coordinates in temp variables.
This will be used to restore the cursor position in a later step.
2. Using the Termion crate, clear the entire screen and move the cursor to row 1 and
column 1 on the screen.
3. Print the header bar of the text viewer. A background color of black and
a foreground color of white is used to print text.
4. Display each line from the internal document buffer to the terminal screen.
5. Move the cursor to the bottom of the screen (using the terminal size y coordinate)
to print the footer.
6. Print the footer text in red and with bold style. Print the number of lines in the
document and filename to the footer.
7. Reset the cursor to the original position (which was saved to temporary variable
in step 1).
Let's look at the set_pos() helper method used by the show_document() method:
src/bin/[Link]
This helper method synchronizes the internal cursor tracking field (the cur_pos field
of the TextViewer struct) and the on-screen cursor position.
We now have the code to initialize the text viewer and to display the document on the
screen. With this, a user can open a document in the text viewer and view its contents.
But how does the user exit the text viewer? We'll find out in the next section.
src/bin/[Link]
fn run(&mut self) {
let mut stdout = stdout().into_raw_mode().unwrap();
let stdin = stdin();
for c in [Link]() {
match [Link]() {
Key::Ctrl('q') => {
break;
}
_ => {}
}
[Link]().unwrap();
}
}
In the code shown, we use the [Link]() method to listen for user inputs in a
loop. stdout() is used to display text to the terminal. When Ctrl + Q is pressed, the
program exits.
Processing keyboard inputs and scrolling 217
Since we have not implemented scrolling yet, pass a filename to the program that has 24
lines or less of content (this is typically the default height of a standard terminal in terms
of the number of rows). You will see the text viewer open up and the header bar, footer
bar, and file contents printed to the terminal. Type Ctrl + Q to exit. Note that you have to
specify the filename with the full file path as a command-line argument.
In this section, we learned how to get the terminal size, set the foreground and
background colors, and apply bold style using the Termion crate. We also learned how to
position the cursor onscreen at specified coordinates, and how to clear the screen.
In the next section, we will look at processing keystrokes for user navigation within the
document displayed in the text editor and how to implement scrolling.
cp src/bin/[Link] src/bin/[Link]
This section is organized into three parts. First, we'll implement the logic to respond to
the following keystrokes from a user: up, down, left, right, and backspace. Next, we'll
implement the functionality to update the cursor position in internal data structures,
and simultaneously update the cursor position onscreen. Lastly, we'll allow scrolling
through a multi-page document.
We'll begin with handling user keystrokes.
218 Implementing Terminal I/O in Rust
src/bin/[Link]
fn run(&mut self) {
let mut stdout = stdout().into_raw_mode().unwrap();
let stdin = stdin();
for c in [Link]() {
match [Link]() {
Key::Ctrl('q') => {
break;
}
Key::Left => {
self.dec_x();
self.show_document();
}
Key::Right => {
self.inc_x();
self.show_document();
}
Key::Up => {
self.dec_y();
self.show_document();
}
Key::Down => {
self.inc_y();
self.show_document();
}
Key::Backspace => {
self.dec_x();
}
_ => {}
}
Processing keyboard inputs and scrolling 219
[Link]().unwrap();
}
}
Lines in bold show the changes to the run() method from the earlier version. In this
code, we are listening for up, down, left, right, and backspace keys. For any of these
keypresses, we are incrementing the x or y coordinate appropriately using one of the
following methods: inc_x(), inc_y(), dec_x(), or dec_y(). For example, if the
right arrow is pressed, the x coordinate of the cursor position is incremented using the
inc_x() method, and if the down arrow is pressed, only the y coordinate is incremented
using the inc_y() method. The changes to coordinates are recorded in the internal data
structure (the cur_pos field of the TextViewer struct). Also, the cursor is repositioned
on the screen. All these are achieved by the inc_x(), inc_y(), dec_x(), and
dec_y() methods.
After updating the cursor position, the screen is refreshed fully and repainted.
Let's look at implementing the four methods to update cursor coordinates, and reposition
the cursor on the screen.
src/bin/[Link]
fn inc_x(&mut self) {
if self.cur_pos.x < self.terminal_size.x {
self.cur_pos.x += 1;
}
println!(
"{}",
termion::cursor::Goto(self.cur_pos.x as u16,
self.cur_pos.y as u16)
);
}
fn dec_x(&mut self) {
220 Implementing Terminal I/O in Rust
if self.cur_pos.x > 1 {
self.cur_pos.x -= 1;
}
println!(
"{}",
termion::cursor::Goto(self.cur_pos.x as u16,
self.cur_pos.y as u16)
);
}
fn inc_y(&mut self) {
if self.cur_pos.y < self.doc_length {
self.cur_pos.y += 1;
}
println!(
"{}",
termion::cursor::Goto(self.cur_pos.x as u16,
self.cur_pos.y as u16)
);
}
fn dec_y(&mut self) {
if self.cur_pos.y > 1 {
self.cur_pos.y -= 1;
}
println!(
"{}",
termion::cursor::Goto(self.cur_pos.x as u16,
self.cur_pos.y as u16)
);
}
The structure of all these four methods is similar and each performs only two steps:
We now have a mechanism to update the cursor coordinates whenever the user presses
the up, down, left, right, or backspace keys. But that's not enough. The cursor should be
repositioned on the screen to the latest cursor coordinates. For this, we will have to update
the show_document() method, which we will do in the next section.
src/bin/[Link]
}
} else {
for line in pos.y - (self.terminal_size.y –
3)..pos.y {
println!("{}\r", [Link][line as
usize]);
}
}
}
The code annotations in the show_document() method snippet are described here:
1. First, check whether the number of lines in the input document is less than
the terminal height. If so, display all lines from the input document on the
terminal screen.
2. If the number of lines in the input document is greater than the terminal height,
we have to display the document in parts. Initially, the first set of lines from the
document are displayed onscreen corresponding to the number of rows that will fit
into the terminal height. For example, if we allocate 21 lines to the text display area,
then as long as the cursor is within these lines, the original set of lines is displayed.
If the user scrolls down further, then the next set of lines is displayed onscreen.
• A file where the number of lines is less than the terminal height
• A file where the number of lines is more than the terminal height
You can use the up, down, left, and right arrows to scroll through the document and see
the contents. You will also see the current cursor position (both x and y coordinates)
displayed on the footer bar. Type Ctrl + Q to exit.
This concludes the text viewer project for this chapter. You have built a functional text
viewer that can display files of any size, and can scroll through its contents using the arrow
keys. You can also view the current position of the cursor along with the filename and
number of lines in the footer bar.
Processing mouse inputs 223
We've completed the implementation of the text viewer project in this section. The text
viewer is a classic command-line application and does not have a GUI interface where
mouse inputs are needed. But it is important to learn how to handle mouse events, for
developing GUI-based terminal interfaces. We'll learn how to do that in the next section.
1. We're importing the termion crate modules for switching to raw mode, detecting
the cursor position, and listening to mouse events:
use std::io::{self, Write};
use termion::cursor::{self, DetectCursorPos};
use termion::event::*;
use termion::input::{MouseTerminal, TermRead};
use termion::raw::IntoRawMode;
224 Implementing Terminal I/O in Rust
To ensure that previous text on the terminal screen does not interfere with this
program, let's clear the screen, as shown here:
writeln!(
stdout,
"{}{} Type q to exit.",
termion::clear::All,
termion::cursor::Goto(1, 1)
)
.unwrap();
2. Next, let's create an iterator over incoming events and listen to mouse events.
Display the location of the mouse cursor on the terminal:
for c in [Link]() {
let evt = [Link]();
match evt {
Event::Key(Key::Char('q')) => break,
Event::Mouse(m) => match m {
MouseEvent::Press(_, a, b) |
MouseEvent::Release(a, b) |
MouseEvent::Hold(a, b) => {
write!(stdout, "{}",
cursor::Goto(a, b))
.unwrap();
let (x, y) = stdout.cursor_pos
().unwrap();
write!(
stdout,
Processing mouse inputs 225
"{}{}Cursor is at:
({},{}){}",
cursor::Goto(5, 5),
termion::clear::
UntilNewline,
x,
y,
cursor::Goto(a, b)
)
.unwrap();
}
},
_ => {}
}
[Link]().unwrap();
}
In the code shown, we are listening to both keyboard events and mouse events.
In keyboard events, we are specifically looking for the Q key, which exits the
program. We are also listening to mouse events – press, release, and hold. In this
case, we position the cursor at the specified coordinates and also print out the
coordinates to the terminal screen.
3. Run the program with the following command:
cargo run --bin mouse-events
4. Click around the screen with the mouse, and you will see the cursor position
coordinates displayed on the terminal screen. Press q to exit.
With this, we conclude the section on working with mouse events on the terminal. This
also concludes the chapter on terminal I/O management using Rust.
226 Implementing Terminal I/O in Rust
Summary
In this chapter, we learned the basics of terminal management by writing a mini text
viewer. We learned how to use the Termion library to get the terminal size, set the
foreground and background colors, and set styles. After this, we learned how to work
with cursors on the terminal, including clearing the screen, positioning the cursor at
a particular set of coordinates, and keeping track of the current cursor position.
We learned how to listen to user inputs and track the keyboard arrow keys for scrolling
operations, including left, right, up, and down. We wrote code to display document
contents dynamically as the user scrolls through it, keeping the constraints of the terminal
size in mind. As an exercise, you can refine the text viewer, and also add functionality to
convert the text viewer into a full-fledged editor.
Learning these features is important to write applications such as terminal-based games,
editing and viewing applications and terminal graphical interfaces, and to provide
terminal-based dashboards.
In the next chapter, we will learn the basics of process management using Rust, including
starting and stopping processes and handling errors and signals.
8
Working with
Processes and
Signals
Do you know how commands are executed when you type them into a terminal interface
on your computer? Are these commands directly executed by the operating system,
or is there an intermediate program that handles them? When you run a program from
the command line in the foreground, and press Ctrl + C, who is listening to this keypress,
and how is the program terminated? How can multiple user programs be run at the same
time by the operating system? What is the difference between a program and a process?
If you are curious, then read on.
In the previous chapter, we learned how to control and alter the terminal interface that is
used to interact with the users in command-line applications.
In this chapter, we will look at processes, which are the second most popular abstraction
in systems programming after files. We'll learn what processes are, how they differ from
programs, how they are started and terminated, and how the process environment can
be controlled. This skill is necessary if you want to write systems programs such as shells,
where you want programmatic control over the life cycle of processes.
228 Working with Processes and Signals
We'll also build an elementary shell program as a mini project by using the Rust Standard
Library This will give you a practical understanding of how popular shells such as Bourne,
Bash, and zsh work under the hood, and teach you the basics of how you can build your
own customized shell environments in Rust.
We will cover these topics in the following order:
By the end of this chapter, you will have learned how to programmatically launch new
programs as separate processes, how to set and adjust environment variables, how to
handle errors, respond to external signals, and exit the process gracefully. You will learn
how to talk to the operating system to perform these tasks using the Rust standard
library. This gives you, as a system programmer, great control over this important system
resource; that is, processes.
Technical requirements
Verify that rustc, and cargo have been installed correctly with the following command:
rustc –version
cargo --version
The Git repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter08.
Note
The section on signal handling requires a Unix-like development environment
(Unix, Linux, or macOS), as Microsoft Windows does not directly have the
concept of signals. If you work with Windows, download a virtual machine
such as Oracle VirtualBox ([Link]
Downloads) or use a Docker container to launch a Unix/Linux image to
follow along.
Understanding Linux process concepts and syscalls 229
When a program is started either from a command line, script, or graphical user interface,
the following steps occur:
1. The operating system (kernel) allocates virtual memory to the program (which is
also called the memory layout of the program). We saw this in Chapter 5, Memory
Management in Rust, on how virtual memory is laid out for a program in terms of
stack, heap, text, and data segments.
2. The kernel then loads the program instructions into the text segment of the
virtual memory.
3. The kernel initializes the program variables in the data segment.
4. The kernel triggers the CPU to start executing the program instructions.
5. The kernel also provides the running program with access to resources it needs,
such as files or additional memory.
The memory layout of a process (running program) was discussed in Chapter 5, Memory
Management. It is reproduced here in Figure 8.1 for reference:
• Virtual memory in which the program instructions and data are loaded, which is
represented in the program memory layout in Figure 8.1.
• A set of metadata about the running program such as the process identifier,
system resources associated with the program (such as a list of open files),
virtual memory tables, and other such information about the program. What is of
particular importance is the process ID, which uniquely identifies an instance of a
running program.
Note
The kernel itself is the process manager. It allocates process IDs to new
instances of user programs. When a system is booted up, the kernel creates
a special process called init, which is assigned a process ID of 1. The init
process terminates only when the system is shut down and cannot be killed.
All future processes are created either by the init process or one of its
descendent processes.
Thus, a program refers to instructions created by the programmer (in the
source or a machine-executable format) and a process is a running instance
of a program that uses system resources and is controlled by the kernel. As
programmers, if we want to control a running program, we will need to use
appropriate system calls to the kernel. The Rust standard library wraps these
system calls into neat APIs for use within Rust programs, as discussed in
Chapter 3, Introduction to the Rust Standard Library.
We've seen how programs relate to processes. Let's discuss some more details about the
characteristics of processes in the next section.
Figure 8.2 shows the key set of tasks related to process management:
So, basically, the syscall in Unix/Linux to create a new child process is different from that
needed to load a new program into the child process and execute it. However, the Rust
standard library simplifies this for us and provides a uniform interface, where both these
steps can be combined while creating a new child process. We'll see examples of this in the
next section.
Let's go back to the question at the beginning of the chapter: What exactly happens when
you type something in the command line of a terminal?
When you run a program by typing the program executable name in a command line, two
things take place:
Terminating a process
A process can terminate itself by using the exit() syscall, or by being killed by a signal
(such as the user pressing Ctrl + C) or using the kill() syscall. Rust also has an exit()
call for this purpose. Rust also provides other ways to abort a process, which we will look
at in a later section.
Handling signals
Signals are used to communicate asynchronous events such as keyboard interrupts to
a process. Except for two of the signals, SIGSTOP and SIGKILL, processes can either
choose to ignore signals or decide how to respond to them in their own way. Handling
signals directly using the Rust standard library is not developer-friendly, so for this,
we can use external crates. We'll be using one such crate in a later section.
In this section, we've seen the differences between a program and a process, delved into
a few of the characteristics of Linux processes, and got an overview of the kind of things
we can do in Rust to interact with processes.
Spawning processes with Rust 235
In the next section, we'll learn first-hand how to spawn, interact, and terminate processes
using Rust by writing some code. Note that in the next few sections, only code snippets
are provided. In order to execute the code, you will need to create a new cargo project and
add the code shown in the src/[Link] file with the appropriate module imports.
use std::process::Command;
fn main() {
Command::new("ls")
.spawn()
.expect("ls command failed to start");
}
The code shown uses the Command::new() method to create a new command for
execution, that takes as a parameter the name of the program to be run. The spawn()
method creates a new child process.
If you run this program, you will see a listing of files in the current directory.
This is the simplest way to spin off a standard Unix shell command or a user program as
a child process using the Rust standard library.
What if you would like to pass parameters to the shell command? Some example code is
shown in the following snippet that passes arguments to the command:
use std::process::Command;
fn main() {
236 Working with Processes and Signals
Command::new("ls")
.arg("-l")
.arg("-h")
.spawn()
.expect("ls command failed to start");
}
The arg() method can be used to pass one argument to the program. Here we want to
run the ls –lh command to display files in a long format with readable file sizes.
We have to use the arg() method twice to pass the two flags.
Alternatively, the args() method can be used as shown here. Note that the
std::process import and the main() function declaration have been removed
in future code snippets to avoid repetition, but you must add them before you can run
the program:
Command::new("ls")
.args(&["-l", "-h"]).spawn().unwrap();
Let's alter the code to list the directory contents for the directory one level above
(relative to the current directory).
The code shows two parameters for the ls command configured through the
args() method.
Next, let's set the current directory for the child process to a non-default value:
Command::new("ls")
.current_dir("..")
.args(&["-l", "-h"])
.spawn()
.expect("ls command failed to start");
In the preceding code, we are spawning the process to run the ls command in the
directory one level above.
Run the program with the following command:
cargo run
We've so far used spawn() to create a new child process. This method returns a handle to
the child process.
There is another way to spawn a new process using output(). The difference is that
output() spawns the child process and waits for it to terminate. Let's see an example:
We are spawning a new process using the output() method to print out the contents of
a file named [Link]. Let's create this file using the following command:
If you run the program, you will see the contents of the [Link] file printed out to the
terminal. Note that we are printing out the contents of the standard output handle of
the child process because that's where the output of the cat command is directed to by
default. We'll learn more details of how to work with child processes' stdin and stdout
later in this chapter.
We'll now look at how to terminate a process.
Terminating processes
We've seen how to spawn new processes. What about terminating them? For this, the Rust
standard library provides two methods—abort() and exit().
The usage of the abort() method is shown in the following snippet:
use std::process;
fn main() {
println!("Going to abort process");
process::abort();
// This statement will not get executed
println!("Process aborted");
}
238 Working with Processes and Signals
This code aborts the current process, and the last statement will not get printed.
There is another exit() method similar to abort(), but it allows us to specify an exit
code that is available to the calling process.
What is the benefit of processes returning error codes? A child process can fail due to
various errors. When the program fails and the child process exits, it would be useful
to the calling program or user to know the error code denoting the reason for failure.
0 indicates a successful exit. Other error codes indicate various conditions such as data
error, system file error, I/O error, and so on. The error codes are platform-specific, but
most Unix-like platforms use 8-bit error codes, allowing for error values between 0 and
255. Examples of error codes for Unix BSD can be found at [Link]
org/cgi/[Link]?query=sysexits&apropos=0&sektion=0&manpath=Fre
eBSD+11.2-stable&arch=default&format=html.
The following is an example showing the returning of error codes from a process with the
exit() method:
use std::process;
fn main() {
println!("Going to exit process with error code 64");
process::exit(64);
// execution never gets here
println!("Process exited");
}
Run this program on the command line in your terminal. To know the exit code of the last
executed process on Unix-like systems, you can type $? on the command line. Note that
this command may vary depending on the platform.
In this section, we've seen how to spawn and terminate processes. Let's next take a look at
how to check the status of execution of a child process after it has been spawned.
Handling I/O and environment variables 239
use std::process::Command;
fn main() {
let status = Command::new("cat")
.arg("[Link]")
.status()
.expect("failed to execute cat");
if [Link]() {
println!("Successful operation");
} else {
println!("Unsuccessful operation");
}
}
Run this program and you will see the message Unsuccessful operation printed out to
your terminal. Re-run the program with a valid filename and you will see the success
message printed.
This concludes this section. You learned different ways to run commands in a separate
child process, how to terminate them, and how to get the status of their execution.
In the next section, we'll look at how to set environment variables and work with I/O for
child processes.
Take the example of a load balancer that is tasked with spawning new workers (Unix
processes) in response to incoming requests. Let's assume the new worker process
reads configuration parameters from environment variables to perform its tasks. The
load balancer process then would need to spawn the worker process and also set its
environment variables. Likewise, there may be another situation where the parent process
wants to read a child process's standard output or standard error and route it to a log file.
Let's understand how to perform such activities in Rust. We'll start with handling the I/O
of the child process.
use std::io::prelude::*;
use std::process::{Command, Stdio};
fn main() {
// Spawn the `ps` command
let process = match Command::new("ps").
stdout(Stdio::piped()).spawn() {
Err(err) => panic!("couldn't spawn ps: {}", err),
Ok(process) => process,
};
let mut ps_output = String::new();
match [Link]().read_to_string(&mut
ps_output) {
Handling I/O and environment variables 241
In the preceding code snippet, we first create a new child process to run the ps command
to show a list of currently running processes. The output is, by default, sent to the child
process's stdout.
In order to get access to the child process's stdout from the parent process, we create
a Unix pipe using the stdio::piped() method. The process variable is the handle
to the child process, and [Link] is the handle to the child process's standard
output. The parent process can read from this handle, and print out its contents to its own
stdout (that is, the parent process's stdout). This is how a parent process can read the
output of a child process.
Let's now write some code to send some bytes from the parent process to the standard
input of the child process:
("palindrome".as_bytes()) {
Err(why) => panic!("couldn't write to stdin: {}",
why),
Ok(_) => println!("sent text to rev command"),
} <3>
let mut child_output = String::new();
match [Link]().read_to_string(&mut
child_output) {
242 Working with Processes and Signals
The descriptions of the numbered annotations in the preceding code are provided here:
1. Register a piped connection between the parent process and standard input of
the child process.
2. Register a piped connection between the parent process and standard output of
the child process.
3. Write bytes to the standard input of the child process.
4. Read from the standard output of the child process and print it to the
terminal screen.
There are a few other methods available on the child process. The id() method provides
the process id of the child process, the kill() method kills the child process, the
stderr method gives a handle to the child process's standard error, and the wait()
method makes the parent process to wait until the child process has completely exited.
We've seen how to handle I/O for child processes. Let's now learn how to work with
environment variables.
use std::process::Command;
fn main() {
Command::new("env")
.env("MY_PATH", "/tmp")
.spawn()
.expect("Command failed to execute");
}
Handling panic, errors, and signals 243
The env() method on std::process::Command allows the parent process to set the
environment variable for the child process being spawned. Run the program and test it
with the following command:
You'll see the value of the MY_PATH environment variable that was set in the program.
To set multiple environment variables, the envs() command can be used.
The environment variables for a child process can be cleared by using the env_clear()
method, as shown:
Command::new("env")
.env_clear()
.spawn()
.expect("Command failed to execute");
Run the program with cargo run , and you will see that nothing is printed out for the
env command. Re-run the program by commenting out the .env_clear() statement,
and you will find the env values printed to terminal.
To remove a specific environment variable, the env_remove() method can be used.
With this, we conclude this section. We've seen how to interact with standard input and
standard output of a child process and to set/reset the environment variables. In the next
section, we'll learn how to handle errors and signals in child processes.
Note
In cases when processes exit due to errors, the operating system itself performs
some cleanup, such as releasing memory, closing network connections, and
releasing any file handles associated with the process. But sometimes, you may
want program-driven controls to handle these cases.
244 Working with Processes and Signals
Failures in process execution can broadly be classified into two types – unrecoverable
errors and recoverable errors. When a process encounters an unrecoverable error, there is
sometimes no option but to abort the process. Let's see how to do that.
Run the program with cargo run and you will see the error message printed out
from the panic! macro. There is also a custom hook that can be registered that will get
invoked before the standard cleanup is performed by the panic macro. Here is the same
example, this time with a custom panic hook:
use std::panic;
use std::process::{Stdio,Command};
fn main() {
panic::set_hook(Box::new(|_| {
println!(" This is an example of custom panic
hook, which is invoked on thread panic, but
before the panic run-time is invoked")
}));
let _child_process = match Command::new("invalid-command")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
{
Err(err) => panic!("Normal panic message {}", err),
Ok(new_process_handle) => new_process_handle,
};
}
On running this program, you will see the custom error hook message displayed, as we are
providing an invalid command to spawn as a child process.
Note that panic! should be used only for non-recoverable errors. For example, if a child
process tries to open a file that does not exist, this can be handled using a recoverable
error mechanism such as the Result enum. The advantage of using Result is that the
program can return to its original state and the failed operation can be retried. If panic!
is used, the program terminates abruptly, and the original state of the program cannot be
recovered. But there are situations where panic! may be appropriate such, as when
a process runs out of memory in the system.
Let's next look at another aspect of process control—signal handling.
246 Working with Processes and Signals
Signal handling
In Unix-like systems, the operating system can send signals to processes. Note that
Windows OS does not have signals. The process can handle the signal in a way it deems
fit, or even ignore the signal. There are operating-system defaults for handling various
signals. For example, when you issue a kill command on a process from a shell, the
SIGTERM signal is generated. The program terminates on receipt of this signal by default,
and there is no special additional code that needs to be written in Rust to handle that
signal. Similarly, a SIGINT signal is received when a user presses Ctrl + C. But a Rust
program can choose to handle these signals in its own way.
However, handling Unix signals correctly is hard for various reasons. For example,
a signal can occur at any time and the thread processing cannot continue until the signal
handler completes execution. Also, signals can occur on any thread and synchronization
is needed. For this reason, it is better to use third-party crates in Rust for signal handling.
Note that even while using external crates, caution should be exercised as the crates do not
solve all problems associated with signal handling.
Let's now see an example of handling signals using the signal-hook crate. Add it to
dependencies in [Link] as shown:
[dependencies]
signal-hook = "0.1.16"
use signal_hook::iterator::Signals;
use std::io::Error;
fn main() -> Result<(), Error> {
let signals = Signals::new(&[signal_hook::SIGTERM,
signal_hook::SIGINT])?;
'signal_loop: loop {
// Pick up signals that arrived since last time
for signal in [Link]() {
match signal {
signal_hook::SIGINT => {
println!("Received signal SIGINT");
}
signal_hook::SIGTERM => {
println!("Received signal SIGTERM");
Handling panic, errors, and signals 247
break 'signal_loop;
}
_ => unreachable!(),
}
}
}
println!("Terminating program");
Ok(())
}
In the preceding code, we listen for two specific signals, SIGTERM and SIGINT,
within the match clause. SIGINT can be sent to the program by pressing Ctrl + C.
The SIGTERM signal can be generated by using the kill command on a process id
from the shell.
Now, run the program and simulate the two signals. Then, press the Ctrl + C key
combination, which generates the SIGINT signal. You will see that instead of the default
behavior (which is to terminate the program), a statement is printed out to the terminal.
To simulate SIGTERM, run a ps command on the command line of a Unix shell and
retrieve the process id. Then run a kill command with the process id. You will see that
the process terminates, and a statement is printed to the terminal.
Note
If you are using tokio for asynchronous code, you can use the tokio-support
feature of signal-hook.
It is important to remember that signal handling is a complex topic, and even with
external crates, care must be exercised while writing custom signal-handling code.
While handling signals or dealing with errors, it is also good practice to log the signal
or error using a crate such as log for future reference and troubleshooting by system
administrators. However, if you'd like a program to read these logs, you can log these
messages in JSON format instead of plaintext by using an external crate such as
serde_json.
This concludes this subsection on working with panic, errors, and signals in Rust. Let's
now write a shell program that demonstrates some of the concepts discussed.
248 Working with Processes and Signals
[[bin]]
name = "iter2"
path = "src/[Link]"
[[bin]]
name = "iter3"
path = "src/[Link]"
In the preceding code, we specify to the Cargo tool that we want to build separate
binaries for the three iterations.
We're now ready to start with the first iteration of the shell program.
Writing a shell program in Rust (project) 249
src/[Link]
use std::io::Write;
use std::io::{stdin, stdout};
use std::process::Command;
fn main() {
loop {
print!("$ "); <1>
stdout().flush().unwrap(); <2>
let mut user_input = String::new(); <3>
stdin()
.read_line(&mut user_input) <4>
.expect("Unable to read user input");
let command_to_execute = user_input.trim(); <5>
let mut child = Command::new(command_to_execute) <6>
.spawn()
.expect("Unable to execute command");
[Link]().unwrap(); <7>
}
}
6. Create a new child process and pass the user commands to the child process
for execution.
7. Wait until the child process completes execution before accepting additional
user inputs.
8. Run the program with the following command:
src/[Link]
.args(&command_args[1..]) <3>
.spawn()
.expect("Unable to execute command");
[Link]().unwrap();
}
}
The code shown is essentially the same as the previous snippet, except for the three
additional lines added, which are annotated with numbers. The annotations are described
as follows:
1. Take the user input, split it by whitespace, and store the result in Vec.
2. The first element of the Vec corresponds to the command. Create a child process to
execute this command.
3. Pass the list of Vec elements, starting from the second element, as a list of
arguments to the child process.
4. Run the program with the following line:
cargo run -–bin iter2
5. Enter a command and pass arguments to it before hitting the Enter key. For
example, you can type one of the following commands:
ls –lah
ps -ef
cat [Link]
Note that in the last command, [Link] is an existing file holding some contents and
located in the project root folder.
You will see the command outputs successfully displayed on the terminal. The shell works
so far as we intended. Let's extend it now a little further in the next iteration.
show files
252 Working with Processes and Signals
This is what we'll code next. The following snippet shows the code. Let's look at the
module imports first:
use std::io::Write;
use std::io::{stdin, stdout};
use std::io::{Error, ErrorKind};
use std::process::Command;
Modules from std::io are imported for writing to the terminal, reading from the
terminal, and for error handling. We already know the purpose of importing the
process module.
Let's now look at the main() program in parts. We won't cover the code already seen
in previous iterations. The complete code for the main() function can be found in the
GitHub repo in the src/[Link] file:
1. After displaying the $ prompt, check whether the user has entered any command.
If the user presses just the Enter key at the prompt, ignore and redisplay the $
prompt. The following code checks whether at least one command has been
entered by the user, then processes the user input:
if command_args.len() > 0 {..}
_ => Err(Error::new(
ErrorKind::InvalidInput,
"please enter valid command",
)),
Writing a shell program in Rust (project) 253
},
"show" if command_args.len() == 1 =>
Err(Error::new(
ErrorKind::InvalidInput,
"please enter valid command",
)),
"quit" => std::process::exit(0),
_ => Command::new(command_args[0])
.args(&command_args[1..])
.spawn(),
};
3. Wait for the child process to complete. If the child process fails to execute
successfully, or if the user input is invalid, throw an error:
match child {
Ok(mut child) => {
if [Link]().unwrap().success() {
} else {
println!("\n{}", "Child process
failed")
}
}
Err(e) => match [Link]() {
ErrorKind::InvalidInput => eprintln!(
"Sorry, show command only
supports following options: files
, process "
),
_ => eprintln!("Please enter a
valid command"),
},
}
254 Working with Processes and Signals
4. Run the program with cargo run –-bin iter3 and try the following
commands at the $ prompt to test:
show files
show process
du
You'll see the commands successfully execute, with a statement printed out indicating
success.
You would have noticed that we've added some error handling in the code. Let's look at
what error conditions we've addressed:
show memory
show
invalid-command
In this section, we've written a shell program that has a subset of the features of
a real-world shell program such as zsh or bash. To be clear, a real-world shell program
has a lot more complex features, but we have covered the fundamental concepts behind
creating a shell program here. Also importantly, we've learned how to handle errors in
case of invalid user inputs or if a child process fails. To internalize your learning, it is
recommended to write some code for the suggested exercises.
This concludes the section on writing a shell program in Rust.
Summary
In this chapter, we reviewed the basics of processes in Unix-like operating systems.
We learned how to spawn a child process, interact with its standard input and standard
output, and execute a command with its arguments. We also saw how to set and clear
environment variables. We looked at the various ways to terminate a process on error
conditions, and how to detect and handle external signals. We finally wrote a shell
program in Rust that can execute the standard Unix commands, but also accept a couple
of commands in a natural-language format. We also handled a set of errors to make the
program more robust.
Continuing on the topic of managing system resources, in the next chapter, we will learn
how to manage threads of a process and build concurrent systems programs in Rust.
9
Managing
Concurrency
Concurrent systems are all around us. When you download a file, listen to streaming
music, initiate a text chat with a friend, and print something in the background on your
computer, all at the same time, you are experiencing the magic of concurrency in action.
The operating system manages all these for you in the background, scheduling tasks across
available processors (CPUs).
But do you know how to write a program that can do multiple things at the same time? More
importantly, do you know how to do it in a way that is both memory- and thread-safe, while
ensuring optimal use of system resources? Concurrent programming is one way to achieve
this. But concurrent programming is considered to be a difficult topic in most programming
languages due to challenges in synchronizing tasks and sharing data safely across multiple
threads of execution. In this chapter, you'll learn about the basics of concurrency in Rust and
how Rust makes it easier to prevent common pitfalls and enables us to write concurrent
programs in a safe manner. This chapter is structured as shown here:
• Reviewing concurrency basics
• Spawning and configuring threads
• Error handling in threads
• Message passing between threads
• Achieving concurrency with shared state
• Pausing thread execution with timers
258 Managing Concurrency
By the end of this chapter, you'll have learned how to write concurrent programs in Rust
by spawning new threads, handling thread errors, transferring and sharing data safely
across threads to synchronize tasks, understanding the basics of thread-safe data types,
and pausing the execution of current threads for synchronization.
Technical requirements
Verify that rustup, rustc, and cargo have been installed correctly with the
following commands:
rustup --version
rustc --version
cargo --version
The Git repo for the code in this chapter can be found at: [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter09.
Let's get started with some basic concepts of concurrency.
We've so far seen a few use cases that require multiple tasks to be performed
simultaneously. But there is also a technical reason that is driving concurrency in
programming, which is that CPU clock speeds on a single core are hitting upper practical
limits. So, it is becoming necessary to add more CPU cores, and more processors on a
single machine. This is in turn driving the need for software that can efficiently utilize
the additional CPU cores. To achieve this, portions of a program should be executable
concurrently on different CPU cores, rather than being constrained by the sequential
execution of instructions on a single CPU core.
These factors have resulted in the increased use of multi-threading concepts in
programming. Here, there are two related terms that need to be understood – concurrency
and parallelism. Let's take a closer look at this.
Figure 9.1 shows three different computation scenarios within a Unix/Linux process:
• Sequential execution: Let's assume that a process has two tasks A and B. Task
A has three subtasks A1, A2, and A3, which are executed sequentially. Likewise,
Task B has two tasks, B1 and B2, that are executed one after the other. Overall, the
process executes all tasks of process A before taking on process B tasks. There is
a challenge in this model. Assume the case where task A2 involves waiting for an
external network or user input, or for a system resource to become available. Here,
all tasks lined up after task A2 will be blocked until A2 completes. This is not an
efficient use of the CPU and causes a delay in the completion of all the scheduled
tasks that belong to the process.
• Concurrent execution: Sequential programs are limited as they do not have
the ability to deal with multiple simultaneous inputs. This is the reason many
modern applications are concurrent where there are multiple threads of execution
running concurrently.
In the concurrent model, the process interleaves the tasks, that is, alternates
between the execution of Task A and Task B, until both of them are complete. Here,
even if A2 is blocked, it allows progress with the other sub-tasks. Each sub-task, A1,
A2, A3, B1, and B2, can be scheduled on separate execution threads. These threads
could run either on a single processor or scheduled across multiple processor
cores. One thing to bear in mind is that concurrency is about order-independent
computations as opposed to sequential execution, which relies on steps executed
in a specific order to arrive at the correct program outcome. Writing programs to
accommodate order-independent computations is more challenging than writing
sequential programs.
• Parallel execution: This is a variant of the concurrent execution model. In this
model, the process executes Task A and Task B truly in parallel, on separate CPU
processors or cores. This assumes, of course, that the software is written in a way
that such parallel execution is possible, and there are no dependencies between Task
A and Task B that could stall the execution or corrupt the data.
Parallel computing is a broad term. Parallelism can be achieved either within a
single machine by having multi-cores or multi-processors or there can be clusters
of different computers that can cooperatively perform a set of tasks.
Reviewing concurrency basics 261
In this section, we've seen two ways to write concurrent programs – concurrency
and parallelism, and how these differ from sequential models of execution. Both these
models use multi-threading as the foundational concept. Let's talk more about this in
the next section.
Concepts of multi-threading
In this section, we'll deep-dive into how multi-threading is implemented in Unix.
Unix supports threads as a mechanism for a process to perform multiple tasks
concurrently. A Unix process starts up with a single thread, which is the main thread
of execution. But additional threads can be spawned, that can execute concurrently in a
single-processor system, or execute in parallel in a multi-processor system.
Each thread has access to its own stack for storing its own local variables and function
parameters. Threads also maintain their own register state including the stack pointer and
program counter. All the threads in a process share the same memory address space, which
means that they share access to the data segments (initialized data, uninitialized data, and
the heap). Threads also share the same program code (process instructions).
262 Managing Concurrency
The program code, however, is common for the threads. Each thread can execute a
different section of the code from the program text segment, and store the local variables
and function parameters within their respective thread stack. When it is the turn of
a thread to execute, its program counter (containing the address of the instruction to
execute) is loaded for the CPU to execute the set of instructions for a given thread.
In the example shown in the diagram, if task A2 is blocked waiting for I/O, then the CPU
will switch execution to another task such as B1 or A1.
With this, we conclude the section on concurrency and multi-threading basics. We are now
ready to get started with writing concurrent programs using the Rust Standard Library.
use std::thread;
fn main() {
for _ in 1..5 {
thread::spawn(|| {
println!("Hi from thread id {:?}",
thread::current().id());
});
}
}
Spawning and configuring threads 265
use std::thread;
fn main() {
let mut child_threads = Vec::new();
for _ in 1..5 {
let handle = thread::spawn(|| {
println!("Hi from thread id {:?}",
thread::current().id());
});
child_threads.push(handle);
}
for i in child_threads {
[Link]().unwrap();
}
}
The changes from the previous program are highlighted. thread::spawn() returns
a thread handle that we're storing in a Vec collection data type. Before the end of the
main() function, we join each child thread to the main thread. This ensures that the
main() function waits until the completion of all the child threads before it exits.
266 Managing Concurrency
Let's run the program again. You'll notice four lines printed, one for each thread. Run the
program a few more times. You'll see four lines printed every time. This is progress. It
shows that joining the child threads to the main threads is helping. However, the order of
thread execution (as seen by the order of print outputs on the terminal) varies with each
run. This is because, when we span multiple child threads, there is no guarantee of the
order in which the threads are executed. This is a feature of multi-threading (as discussed
earlier), not a bug. But this is also one of the challenges of working with threads, as this
brings difficulties in synchronizing activities across threads. We'll learn how to address
this a little later in the chapter.
We've so far seen how to use the thread::spawn() function to create a new thread.
Let's now see the second way to create a new thread.
The thread::spawn() function uses default parameters for thread name and stack
size. If you'd like to set them explicitly, you can use thread:Builder. This is a thread
factory that uses the Builder pattern to configure the properties of a new thread. The
previous example has been rewritten here using the Builder pattern:
use std::thread;
fn main() {
let mut child_threads = Vec::new();
for i in 1..5 {
let builder = thread::Builder::new().name(format!(
"mythread{}", i));
let handle = builder
.spawn(|| {
println!("Hi from thread id {:?}", thread::
current().name().unwrap());
})
.unwrap();
child_threads.push(handle);
}
for i in child_threads {
[Link]().unwrap();
}
}
Error handling in threads 267
The changes are highlighted in the code. We are creating a new builder object by using
the new() function, and then configuring the name of the thread using the name()
method. We're then using the spawn() method on an instance of the Builder
pattern. Note that the spawn() method returns a JoinHandle type wrapped in
io::Result<JoinHandle<T>>, so we have to unwrap the return value of the method
to retrieve the child process handle.
Run the code and you'll see the four thread names printed to your terminal.
We've so far seen how to spawn new threads. Let's now take a look at error handling while
working with threads.
use std::fs;
use std::thread;
fn copy_file() -> thread::Result<()> {
thread::spawn(|| {
fs::copy("[Link]", "[Link]").expect("Error
occurred");
})
.join()
}
fn main() {
match copy_file() {
Ok(_) => println!("Ok. copied"),
Err(_) => println!("Error in copying file"),
}
}
We have a function, copy_file(), that copies a source file to a destination file. This
function returns a thread::Result<()> type, which we are unwrapping using a
match statement in the main() function. If the copy_file() function returns a
Result::Err variant, we handle it by printing an error message.
268 Managing Concurrency
Run the program with cargo run with an invalid source filename. You will see the error
message: Error in copying file printed to the terminal. If you run the program with a
valid source filename, it will match the Ok() branch of the match clause, and the success
message will be printed.
This example shows us how to handle errors propagated by a thread in the calling
function. What if we want a way to recognize that the current thread is panicking, even
before it is propagated to the calling function. The Rust Standard Library has a function,
thread::panicking(), available in the std::thread module for this. Let's learn
how to use it by modifying the previous example:
use std::fs;
use std::thread;
struct Filenames {
source: String,
destination: String,
}
impl Drop for Filenames {
fn drop(&mut self) {
if thread::panicking() {
println!("dropped due to panic");
} else {
println!("dropped without panic");
}
}
}
fn copy_file(file_struct: Filenames) -> thread::Result<()> {
thread::spawn(move || {
fs::copy(&file_struct.source,
&file_struct.destination).expect(
"Error occurred");
})
.join()
}
fn main() {
let foo = Filenames {
source: "[Link]".into(),
destination: "[Link]".into(),
Message passing between threads 269
};
match copy_file(foo) {
Ok(_) => println!("Ok. copied"),
Err(_) => println!("Error in copying file"),
}
}
We've created a struct, Filenames, which contains the source and destination
filenames to copy. We're initializing the source filename with an invalid value. We're
also implementing the Drop trait for the Filenames struct, which gets called when an
instance of the struct goes out of scope. In this Drop trait implementation, we are using
the thread::panicking() function to check if the current thread is panicking, and
are handling it by printing out an error message. The error is then propagated to the main
function, which also handles the thread error and prints out another error message.
Run the program with cargo run and an invalid source filename, and you will see the
following messages printed to your terminal:
Also, note the use of the move keyword in the closure supplied to the spawn()
function. This is needed for the thread to transfer ownership of the file_struct data
structure from the main thread to the newly spawned thread.
We've seen how to handle thread panic in the calling function and also how to detect if
the current thread is panicking. Handling errors in child threads is very important to
ensure that the error is isolated and does not bring the whole process down. Hence special
attention is needed to design error handling for multi-threaded programs.
Next, we'll move on to the topic of how to synchronize computations across threads,
which is an important aspect of writing concurrent programs.
One way to ensure program correctness in the face of the unpredictable ordering of thread
execution is to introduce mechanisms for synchronizing activities across threads. One
such model for concurrent programming is message-passing concurrency. It is a way to
structure the components of a concurrent program. In our case, concurrent components
are threads (but they can also be processes). The Rust Standard Library has implemented
a message-passing concurrency solution called channels. A channel is basically like a pipe,
with two parts – a producer and a consumer. The producer puts a message into a channel,
and a consumer reads from the channel.
Many programming languages implement the concept of channels for inter-thread
communications. But Rust's implementation of channels has a special property – multiple
producer single consumer (mpsc). This means, there can be multiple sending ends but only
one consuming end. Translate this to the world of threads: we can have multiple threads
that send values into a channel, but there can be only one thread that can receive and
consume these values. Let's see how this works with an example that we'll build out step
by step. The complete code listing is also provided in the Git repo for the chapter under
src/[Link]:
1. Let's first declare the module imports – the mpsc and thread modules from the
standard library:
use std::sync::mpsc;
use std::thread;
"four".into()];
for num in num_vec {
[Link](num).unwrap();
}
});
6. Spawn a second thread moving the transmission handle transmitter2 into the
thread closure. Inside this thread, send another bunch of values into the channel
using the transmission handle:
thread::spawn(move || {
let num_vec: Vec<String> =
vec!["Five".into(), "Six".into(),
"Seven".into(), "eight".into()];
for num in num_vec {
[Link](num).unwrap();
}
});
7. In the main thread of the program, use the receiving handle of the channel to
consume the values being written into the channel by the two child threads:
for received_val in receiver {
println!("Received from thread: {}",
received_val);
}
"four".into()];
for num in num_vec {
[Link](num).unwrap();
}
});
thread::spawn(move || {
let num_vec: Vec<String> =
vec!["Five".into(), "Six".into(),
"Seven".into(), "eight".into()];
for num in num_vec {
[Link](num).unwrap();
}
});
for received_val in receiver {
println!("Received from thread: {}",
received_val);
}
}
8. Run the program with cargo run. (Note: If you are running code from the Packt
Git repo, use cargo run --bin message-passing). You'll see the values
printed out in the main program thread, which are sent from the two child threads.
Each time you run the program, you may get a different order in which the values
are received, as the order of thread execution is non-deterministic.
To summarize, Mutex ensures that at most one thread is able to access some data at one
time, while Arc enables shared ownership of some data and prolongs its lifetime until all
the threads have finished using it.
Let's see the usage of Mutex with Arc to demonstrate shared-state concurrency
with a step-by-step example. This time, we'll write a more complex example than just
incrementing a shared counter value across threads. We'll take the example we wrote in
Chapter 6, Working with Files and Directories in Rust, to compute source file stats for all
Rust files in a directory tree, and modify it to make it a concurrent program. We'll define
the structure of the program in the next section. The complete code for this section can be
found in the Git repo under src/[Link].
3. Within the main() function, create a new instance of SrcStats, protect it with a
Mutex lock, and then wrap it inside an Arc type:
4. Read the [Link] file, and store the individual entries in a vector:
let mut dir_list = File::open(
"./[Link]").unwrap();
let reader = BufReader::new(&mut dir_list);
let dir_lines: Vec<_> = [Link]().collect();
5. Iterate through the dir_lines vector, and for each entry, spawn a new thread to
perform the following two steps:
a) Accumulate the list of files from each subdirectory in the tree.
b) Then open each file and compute the stats. Update the stats in the
shared-memory struct protected by Mutex and Arc.
276 Managing Concurrency
The overall skeletal structure of the code for this step looks like this:
In this section, we read the list of directory entries for computing source file statistics from
a file. We then iterated through the list to spawn a thread to process each entry. In the next
section, we'll define the processing to be done in each thread.
1. In sub-step A, let's read through each subdirectory under the directory entry, and
accumulate the consolidated list of all Rust source files in the file_entries
vector. The code for sub-step A is shown. Here, we are first creating two vectors to
hold the directory and filenames respectively. Then we are iterating through the
directory entries of each item from the [Link] file, and accumulating the
entry names into the dir_entries or file_entries vector depending upon
whether it is a directory or an individual file:
let mut dir_entries = vec![PathBuf::
from(dir)];
let mut file_entries = vec![];
while let Some(entry) = dir_entries.pop()
{
for inner_entry in fs::read_dir(
&entry).unwrap() {
Achieving concurrency with shared state 277
file_entries.push(
entry);
}
}
}
}
}
At the end of sub-step A, all individual filenames are stored in the file_entries
vector, which we will use in sub-step B for further processing.
2. In sub-step B, we'll read each file from the file_entries vector, compute the
source stats for each file, and save the values in the shared memory struct. Here is
the code snippet for sub-step B:
for file in file_entries {
let file_contents =
fs::read_to_string(
&[Link]()).unwrap();
src_stats.lock().unwrap();
for line in file_contents.lines() {
if [Link]() == 0 {
stats_pointer.blanks += 1;
} else if line.starts_with("//") {
stats_pointer.comments += 1;
} else {
stats_pointer.loc += 1;
}
}
stats_pointer.number_of_files += 1;
}
3. Let's again review the skeletal structure of the program shown next. We've so far
seen the code to be executed within the thread, which includes processing for
steps A and B:
let mut child_handles = vec![];
for dir in dir_lines {
let dir = [Link]();
let src_stats = Arc::clone(&stats_counter);
Note that at the end of the thread-related code, we are accumulating the thread
handle in the child_handles vector.
Achieving concurrency with shared state 279
4. Let's look at the last part of the code now. As discussed earlier, in order to ensure
that the main thread does not complete before the child threads are completed, we
have to join the child thread handles with the main threads. Also, let's print out the
final value of the thread-safe stats_counter struct, which contains aggregated
source stats from all the Rust source files under the directory (updated by the
individual threads):
for handle in child_handles {
[Link]().unwrap();
}
println!(
"Source stats: {:?}",
stats_counter.lock().unwrap()
);
The complete code listing can be found in the Git repo for the chapter in src/
[Link].
Before running this program, ensure to create a file, [Link], in the root
folder of the cargo project, containing a list of directory entries with a full path, each
on a separate line.
5. Run the project with cargo run. (Note: If you are running code from the
Packt Git repo, use cargo run --bin shared-state.) You will see the
consolidated source stats printed out. Note that we have now implemented a
multi-threaded version of the project we wrote in Chapter 6, Working with Files and
Directories in Rust. As an exercise, alter this example to implement the same project
with the message-passing concurrency model.
In this section, we've seen how multiple threads can safely write to a shared value
(wrapped in Mutex and Arc) that is stored in process heap memory, in a thread-safe
manner. In the next section, we will review one more mechanism available to control
thread execution, which is to selectively pause the processing of the current thread.
280 Managing Concurrency
use std::thread;
use std::time::Duration;
Pausing thread execution with timers 281
fn main() {
let duration = Duration::new(1,0);
println!("Going to sleep");
thread::sleep(duration);
println!("Woke up");
}
Using the sleep() function is fairly straightforward, but this blocks the current
thread and it is important to make judicious use of this in a multi-threaded program.
An alternative to using sleep() would be to use an async programming model to
implement threads with non-blocking I/O.
Summary
In this chapter, we covered the basics of concurrency and multi-threaded programming
in Rust. We started by reviewing the need for concurrent programming models. We
understood the differences between the concurrent and parallel execution of programs. We
learned how to spawn new threads using two different methods. We handled errors using
a special Result type in the thread module and also learned how to check whether the
current thread is panicking. We looked at how threads are laid out in process memory. We
discussed two techniques for synchronizing processing across threads – message-passing
concurrency and shared-state concurrency, with practical examples. As a part of this,
we learned about channels, Mutex and Arc in Rust, and the role they play in writing
concurrent programs. We then discussed how Rust classifies data types as thread-safe or
not, and saw how to pause the execution of the current thread.
This concludes the chapter on managing concurrency in Rust. This also concludes
Section 2 of this book, which is on managing and controlling system resources in Rust.
We will now move on to the last part of the book – Section 3 covering advanced topics. In
the next chapter, we will cover how to perform device I/O in Rust, and internalize learning
through an example project.
Section 3:
Advanced Topics
This section covers advanced topics, including working with peripheral devices, network
primitives and TCP/UDP communications, unsafe Rust, and interacting with other
programming languages. Example projects include writing a program to detect details
of connected USB devices, writing a TCP reverse proxy with an origin server, and an
example of FFI.
This section comprises the following chapters:
By the end of this chapter, you will have learned how to work with standard readers and
writers, which constitute the foundation of any I/O operation. You'll also learn how to
optimize system calls through the use of buffered reads and writes. We'll cover reading
and writing to standard I/O streams of a process and handling errors from I/O operations,
as well as learning ways to iterate over I/O. These concepts will be reinforced through an
example project.
Technical requirements
Verify that rustup, rustc, and cargo have been installed correctly with the
following command:
rustup --version
rustc --version
cargo --version
The Git repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter10/usb.
For running and testing the project in this book, you must have the native libusb library
installed where it can be found by pkg-config.
The project in this book has been tested on macOS Catalina 10.15.6.
For instructions on building and testing on Windows, refer: [Link]
dcuddeback/libusb-rs/issues/20
For general instructions on environmental setup of libusb crate, refer to:
[Link]
Understanding device I/O fundamentals in Linux 287
The operating system (specifically the kernel) accepts system calls from the user programs
for device access and control, and then uses the respective device driver to physically
access and control the device. Figure 10.1 illustrates how user space programs (for
example, Rust programs that use the standard library to talk to the operating system
kernel) use system calls to manage and control various types of devices:
Types of devices
In Unix/Linux, devices are broadly classified into three types:
• Character devices send or receive data as a serial stream of bytes. Examples are
terminals, keyboards, mice, printers, and sound cards. Unlike regular files, data
cannot be accessed at random but only sequentially.
Understanding device I/O fundamentals in Linux 289
• Block devices store information in fixed-size blocks and allow random access to
these blocks. Filesystems, hard disks, tape drives, and USB cameras are examples of
block devices. A filesystem is mounted on a block device.
• Network devices are similar to character devices as data is read serially, but there
are some differences. Data is sent in variable-length packets using a network
protocol, which the operating system and the user program have to deal with.
A network adaptor is usually a hardware device (with some exceptions, such as
the loopback interface, which is a software interface) that interfaces to a network
(such as Ethernet or Wi-Fi).
A hardware device is identified by its type (block or character) and a device number. The
device number in turn is split into a major and minor device number.
When a new hardware is connected, the kernel needs a device driver that is compatible
with the device and can operate the device controller hardware. A device driver, as
discussed earlier, is essentially a shared library of low-level, hardware-handling functions
that can operate in a privileged manner as part of the kernel. Without device drivers, the
kernel does not know how to operate the device. When a program attempts to connect
to a device, the kernel looks up associated information in its tables and transfers control
to the device driver. There are separate tables for block and character devices. The device
driver performs the required task on the device and returns control back to the operating
system kernel.
As an example, let's look at a web server sending a page to a web browser. The data is
structured as an HTTP response message with the web page (HTML) sent as part of its
data payload. The data itself is stored in the kernel in a buffer (data structure), which is
then passed to the TCP layer, then to the IP layer, on to the Ethernet device driver, then
to the Ethernet adaptor, and onward to the network. The Ethernet device driver does not
know anything about connections and only handles data packets. Similarly, when data
needs to be stored to a file on the disk, the data is stored in a buffer, which is passed on
to the filesystem device driver and then onward to the disk controller, which then saves
it to the disk (for example, hard disk, SSD, and so on). Essentially, the kernel relies on a
device driver to interface with the device.
Device drivers are usually part of the kernel (kernel device driver), but there are also user
space device drivers, which abstract out the details of kernel access. Later in this chapter,
we will be using one such user space device driver to detect USB devices.
We've discussed the basics of device I/O, including device drivers and types of devices in
Unix-like systems, in this section. Starting from the next section, we'll focus on how to do
device-independent I/O using features from the Rust Standard Library.
290 Working with Device I/O
use std::fs::File;
use std::io::Read;
fn main() {
// Open a file
let mut f = File::open("[Link]").unwrap();
//Create a memory buffer to read from file
let mut buffer = [0; 1024];
// read from file into buffer
let _ = [Link](&mut buffer[..]).unwrap();
}
Create a file called [Link] in the project root and run the program with cargo
run. You can optionally print out the value of the buffer, which will display the raw bytes.
Doing buffered reads and writes 291
Read and Write are byte-based interfaces, which can get inefficient as they involve
continual system calls to the operating system. To overcome this, Rust also provides two
structs to enable doing buffered reads and writes – BufReader and BufWriter, which
have a built-in buffer and reduce the number of calls to the operating system.
The previous example can be rewritten as shown here, to use BufReader:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
// Open a file
let f = File::open("[Link]").unwrap();
// Create a BufReader, passing in the file handle
let mut buf_reader = BufReader::new(f);
//Create a memory buffer to read from file
let mut buffer = String::new();
// read a line into the buffer
buf_reader.read_line(&mut buffer).unwrap();
println!("Read the following: {}", buffer);
}
The code changes (from the previous version) have been highlighted. BufReader uses
the BufRead trait, which is brought into scope. Instead of reading directly from the file
handle, we create a BufReader instance and read a line into this struct. The BufReader
methods internally optimize calls to the operating system. Run the program and verify
that the value from the file is printed correctly.
BufWriter similarly buffers writes to the disk, thus minimizing system calls. It can be
used in a similar manner as shown in the following code:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() {
// Create a file
let f = File::create("[Link]").unwrap();
// Create a BufWriter, passing in the file handle
let mut buf_writer = BufWriter::new(f);
//Create a memory buffer
let buffer = String::from("Hello, testing");
292 Working with Device I/O
In the code shown, we're creating a new file to write into, and are also creating a new
BufWriter instance. We then write a value from the buffer into the BufWriter
instance. Run the program and verify that the specified string value has been written to
a file with the name [Link] in the project root directory. Note that here, in addition
to BufWriter, we also have to bring the Write trait into scope as this contains the
write() method.
Note when to use and when not to use BufReader and BufWriter:
• BufReader and BufWriter speed up programs that make small and frequent
reads or writes to a disk. If the reads or writes only occasionally involve large-sized
data, they do not offer any benefit.
• BufReader and BufWriter do not help while reading from or writing to
in-memory data structures.
In this section, we saw how to do both unbuffered and buffered reads and writes. In the
next section, we'll learn how to work with standard inputs and outputs of a process.
The code example here shows how to interact with the standard input and standard output
streams of a process. In the code shown, we are reading a line from the standard input
into a buffer. We're then writing back the contents of the buffer to the standard output of
the process. Note that here, the word process refers to the running program that you have
written. You are essentially reading from and writing to the standard input and standard
output, respectively, of the running program:
Run the program with cargo run, enter some text, and hit the Enter key. You'll see the
text echoed back on the terminal.
Stdin, which is a handle to the input stream of a process, is a shared reference to a
global buffer of input data. Likewise, Stdout, which is the output stream of a process,
is a shared reference to a global data buffer. Since Stdin and Stdout are references to
shared data, to ensure exclusive use of these data buffers, the handles can be locked. For
example, the StdinLock struct in the std::io module represents a locked reference to
the Stdin handle. Likewise, the StdoutLock struct in the std::io module represents
a locked reference to the Stdout handle. Examples of how to use the locked reference are
shown in the code example here:
In the code shown, the standard input and output stream handles are locked before
reading and writing to them.
We can similarly write to the standard error stream. A code example is shown here:
use std::io::Write;
fn main() {
//Create a memory buffer
let buffer = b"Hello, this is error message from
standard
error stream\n";
// Get handle to output error stream
let stderr_handle = std::io::stderr();
// Lock the handle to output error stream
let mut locked_stderr_handle = stderr_handle.lock();
// write into error stream from buffer
locked_stderr_handle.write(buffer).unwrap();
}
In the code shown, we're constructing a handle to the standard error stream using the
stderr() function. Then, we're locking this handle and then writing some text to it.
In this section, we've seen how to interact with the standard input, standard output, and
standard error streams of a process using the Rust Standard Library. Recall that in the
previous chapter on managing concurrency, we saw how, from a parent process, we can
read from and write to the standard input and output streams of the child process.
In the next section, let's look at a couple of functional programming constructs that can be
used for I/O in Rust.
Chaining and iterators over I/O 295
In the code shown, we have created a handle to the standard input stream and passed it to
a BufReader struct. This struct implements the BufRead trait, which has a lines()
method that returns an iterator over the lines of the reader. This helps us to type inputs on
the terminal line by line and have it read by our running program. The text entered on the
terminal is echoed back to the terminal. Execute cargo run, and type some text, and
then hit the Enter key. Repeat this step as many times as you'd like. Exit from the program
with Ctrl + C.
Likewise, the iterator can be used to read line by line from a file (instead of from standard
input, which we saw in the previous example). A code snippet is shown here:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
// Open a file for reading
let f = File::open("[Link]").unwrap();
//Create a BufReader instance to optimize sys calls
296 Working with Device I/O
Create a file called [Link] in the project root directory. Enter a few lines of text in this
file. Then, run the program using cargo run. You'll see the file contents printed out to
the terminal.
We've so far seen how to use iterators from the std::io module. Let's now look at
another concept: chaining.
The Read trait in the std::io module has a chain() method, which allows us to chain
multiple BufReader together into one handle. Here is an example of how to create a
single chained handle combining two files, and how to read from this handle:
use std::fs::File;
use std::io::Read;
fn main() {
// Open two file handles for reading
let f1 = File::open("[Link]").unwrap();
let f2 = File::open("[Link]").unwrap();
//Chain the two file handles
let mut chained_handle = [Link](f2);
// Create a buffer to read into
let mut buffer = String::new();
// Read from chained handle into buffer
chained_handle.read_to_string(&mut buffer).unwrap();
// Print out the value read into the buffer
println!("Read from chained handle:\n{}", buffer);
}
The statement using the chain() method has been highlighted in the code. The rest of
the code is fairly self-explanatory, as it is similar to what we've seen in previous examples.
Ensure to create two files, [Link] and [Link], under the project root folder
and enter a few lines of text in each. Run the program with cargo run. You'll see the
data from both files printed out line by line.
Handling errors and returning values 297
In this section, we've seen how to use iterators and how to chain readers together. In the
next section, let's take a look at error handling for I/O operations.
use std::fs::File;
use std::io::Read;
fn main() -> std::io::Result<()> {
// Open two file handles for reading
let f1 = File::open("[Link]")?;
let f2 = File::open("[Link]")?;
//Chain the two file handles
let mut chained_handle = [Link](f2);
// Create a buffer to read into
let mut buffer = String::new();
// Read from chained handle into buffer
chained_handle.read_to_string(&mut buffer)?;
println!("Read from chained handle: {}", buffer);
Ok(())
}
Code related to error handling has been highlighted. Run the program with cargo run,
this time making sure that neither [Link] nor [Link] exists in the project
root folder.
You'll see the error message printed to the terminal.
298 Working with Device I/O
In the code we've just seen, we're just propagating the error received from the operating
system while making the calls. Let's now try to handle the errors in a more active manner.
The code example here shows custom error handling for the same code:
use std::fs::File;
use std::io::Read;
fn read_files(handle: &mut impl Read) ->
std::io::Result<String> {
// Create a buffer to read into
let mut buffer = String::new();
// Read from chained handle into buffer
handle.read_to_string(&mut buffer)?;
Ok(buffer)
}
fn main() {
let mut chained_handle;
// Open two file handles for reading
let file1 = "[Link]";
let file2 = "[Link]";
if let Ok(f1) = File::open(file1) {
if let Ok(f2) = File::open(file2) {
//Chain the two file handles
chained_handle = [Link](f2);
let content = read_files(&mut chained_handle);
match content {
Ok(text) => println!("Read from chained
handle:\n{}", text),
Err(e) => println!("Error occurred in
reading files: {}", e),
}
} else {
println!("Unable to read {}", file2);
}
} else {
println!("Unable to read {}", file1);
}
}
Getting details of connected USB devices (project) 299
You'll notice that we've created a new function that returns std::io::Result to the
main() function. We're handling errors in various operations, such as reading from a file
and reading from the chained readers.
First, run the program with cargo run, ensuring that both [Link] and file2.
txt exist. You'll see the contents from both files printed to the terminal. Rerun the
program by removing one of these files. You should see the custom error message from
our code.
With this, we conclude the section on handling errors. Let's now move on to the last
section of the chapter, where we will go through a project to detect and display details of
USB devices connected to a computer.
• When a USB device is plugged into a computer, the electrical signals on the
computer bus trigger the USB controller (hardware device) on the computer.
• The USB controller raises an interrupt on the CPU, which then executes the
interrupt handler registered for that interrupt in the kernel.
• When a call is made from the Rust program through the Rust libusb wrapper
crate, the call is routed to the libusb C library, which in turn makes a system call
on the kernel to read the device file corresponding to the USB device. We've seen
earlier in this chapter how Unix/Linux enables standard syscalls, such as read()
and write(), for I/O.
• When the system call returns from the kernel, the libusb library returns the value
from the syscall to our Rust program.
300 Working with Device I/O
We're using the libusb library because writing a USB device driver from scratch requires
implementing the USB protocol specifications, and writing device drivers is the subject of
a separate book in itself. Let's look at the design of our program:
Figure 10.2 shows the structs and functions in the program. Here is a description of the
data structures:
3. We'll now look at the code in parts. Add all the code for this project in usb/src/
[Link].
302 Working with Device I/O
We're importing the libusb modules and a few modules from the Rust
Standard Library. fs::File and io::Write are for writing to an output file,
result::Result is the return value from the functions, and time::Duration
is for working with the libusb library.
4. Let's look at the data structures now:
#[derive(Debug)]
struct USBError {
err: String,
}
struct USBList {
list: Vec<USBDetails>,
}
#[derive(Debug)]
struct USBDetails {
manufacturer: String,
product: String,
serial_number: String,
bus_number: u8,
device_address: u8,
vendor_id: u16,
product_id: u16,
maj_device_version: u8,
min_device_version: u8,
}
USBError is for custom error handling, USBList is to store a list of the USB devices
detected, and USBDetails is to capture the list of details for each USB device.
Getting details of connected USB devices (project) 303
5. Let's implement the Display trait for the USBList struct so that custom
formatting can be done to print the contents of the struct:
impl fmt::Display for USBList {
fn fmt(&self, f: &mut fmt::Formatter<'_>) ->
fmt::Result {
Ok(for usb in &[Link] {
writeln!(f, "\nUSB Device details")?;
writeln!(f, "Manufacturer: {}",
[Link])?;
writeln!(f, "Product: {}", [Link])?;
writeln!(f, "Serial number: {}",
usb.serial_number)?;
writeln!(f, "Bus number: {}",
usb.bus_number)?;
writeln!(f, "Device address: {}",
usb.device_address)?;
writeln!(f, "Vendor Id: {}",
usb.vendor_id)?;
writeln!(f, "Product Id: {}",
usb.product_id)?;
writeln!(f, "Major device version: {}",
usb.maj_device_version)?;
writeln!(f, "Minor device version: {}",
usb.min_device_version)?;
})
}
}
6. Next, we'll implement From traits for the USBError struct so that errors from the
libusb crate and from the Rust Standard Library are automatically converted into
the USBError type when we use the ? operator:
impl From<libusb::Error> for USBError {
fn from(_e: libusb::Error) -> Self {
USBError {
err: "Error in accessing USB
304 Working with Device I/O
device".to_string(),
}
}
}
impl From<std::io::Error> for USBError {
fn from(e: std::io::Error) -> Self {
USBError { err: e.to_string() }
}
}
7. Let's next look at the function to write the details retrieved for all the attached
devices to an output file:
//Function to write details to output file
fn write_to_file(usb: USBList) -> Result<(), USBError> {
let mut file_handle = File::create
("usb_details.txt")?;
write!(file_handle, "{}\n", usb)?;
Ok(())
}
In the main() function, we're first creating a new libusb Context that
can return the list of connected devices. We are then iterating through the
device list obtained from the Context struct, and calling the get_device_
information() function for each USB device. The details are finally also printed
out to an output file by calling the write_to_file() function that we saw earlier.
2. To wrap up the code, let's write the function to get the device details:
// Function to print device information
fn get_device_information(device: Device, handle:
&DeviceHandle) -> Result<USBDetails, USBError> {
let device_descriptor =
device.device_descriptor()?;
let timeout = Duration::from_secs(1);
let languages = handle.read_languages(timeout)?;
let language = languages[0];
// Get device manufacturer name
let manufacturer =
handle.read_manufacturer_string(
language, &device_descriptor, timeout)?;
// Get device USB product name
306 Working with Device I/O
This concludes the code. Make sure to plug in a USB device (such as a thumb drive) to the
computer. Run the code with cargo run. You should see the list of attached USB devices
printed to the terminal, and also written to the output usb_details.txt file.
Note that in this example, we have demonstrated how to do file I/O using both an external
crate (for retrieving USB device details) and the standard library (for writing to an output
file). We've unified error handling using a common error handling struct, and automated
conversions of error types to this custom error type.
The Rust crates ecosystem ([Link]) has similar crates to interact with other types of
devices and filesystems. You can experiment with them.
This concludes the section on writing a program to retrieve USB details.
Summary 307
Summary
In this chapter, we reviewed the foundational concepts of device management in Unix/
Linux. We looked at how to do buffered reads and writes using the std::io module.
We then learned how to interact with the standard input, standard output, and standard
error streams of a process. We also saw how to chain readers together and use iterators for
reading from devices. We then looked at the error handling features with the std::io
module. We concluded with a project to detect the list of connected USB devices and
printed out the details of each USB device both to the terminal and to an output file.
The Rust Standard Library provides a clean layer of abstraction for doing I/O operations
on any type of device. This encourages the Rust ecosystem to implement these standard
interfaces for any type of device, enabling Rust system programmers to interact with
different devices in a uniform manner. Continuing on the topic of I/O, in the next chapter,
we will learn how to do network I/O operations using the Rust Standard Library.
11
Learning Network
Programming
In the previous chapter, we learned how to communicate with peripheral devices from
Rust programs. In this chapter, we will switch our focus to another important system
programming topic – networking.
Most modern operating systems, including Unix/Linux and Windows variants, have
native support for networking using TCP/IP. Do you know how you can use TCP/IP to
send byte streams or messages from one computer to another? Do you want to know
what kind of language support Rust provides for synchronous network communications
between two processes running on different machines? Are you interested in learning the
basics of configuring TCP and UDP sockets, and working with network addresses and
listeners in Rust? Then, read on.
We will cover these topics in the following order:
By the end of this chapter, you will have learned how to work with network addresses,
determine address types, and do address conversions. You will also learn how to create
and configure sockets and query on them. You will work with TCP listeners, create a TCP
socket server, and receive data. Lastly, you'll put these concepts into practice through an
example project.
It is important to learn these topics because sockets-based programming using TCP or
UDP forms the basis for writing distributed programs. Sockets help two processes on
different (or even the same) machines to establish communication with each other and
exchange information. They form the foundation for practically all web and distributed
applications on the internet, including how an internet browser accesses a web page and
how a mobile application retrieves data from an API server. In this chapter, you will learn
what kind of support is provided by the Rust standard library for socket-based network
communications.
Technical requirements
Verify that rustup, rustc, and cargo have been installed correctly with the
following command:
rustup --version
rustc --version
cargo --version
The Git repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter11.
The next layer up in the TCP/IP protocol suite is the transport layer. Here, there are two
popular protocols used on the internet – TCP and UDP. TCP stands for transmission
control protocol and UDP is user datagram protocol. While the network (IP) layer is
concerned with sending data packets between two hosts, the transport layer (TCP and
UDP) is concerned with sending data streams between two processes (applications or
programs) running on the same host or different hosts.
If there are two applications running on a single host IP address, the way to uniquely
identify each application is by using a port number. Each application that is involved in
network communications listens on a specific port, which is a 16-bit number.
Examples of popular ports are 80 for the HTTP protocol, 443 for the HTTPS protocol,
and 22 for the SSH protocol. The combination of an IP address and a port number
is called a socket. We'll see in this chapter how to work with sockets using the Rust
standard library. UDP, like IP, is connectionless and does not incorporate any reliability
mechanisms. But it is fast and has a low overhead compared to TCP. It is used in higher-
level services, such as DNS, to get host IP addresses corresponding to a domain name.
Compared to UDP, TCP provides a connection-oriented, reliable communication channel
between two endpoints (application/user space programs) over which byte streams
can be exchanged while preserving the sequence of data. It incorporates features such
as retransmission in the case of errors, acknowledgments of packets received, and
timeouts. We'll discuss TCP-based communication in detail in this chapter and later build
a reverse proxy using TCP socket-based communications.
The uppermost layer in the TCP/IP protocol suite is the application layer. While the TCP
layer is connection-oriented and works with byte streams, it has no knowledge of the
semantics of a message transmitted. This is provided by the application layer. For example,
HTTP, which is the most popular application protocol on the internet, uses HTTP request
and response messages to communicate between HTTP clients (for example, internet
browsers) and HTTP servers (for example, web servers). The application layer reads the
byte streams received from the TCP layer and interprets them into HTTP messages, which
are then processed by the application program that we write in Rust or other languages.
There are several libraries (or crates) available in the Rust ecosystem that implement the
HTTP protocol, so Rust programs can leverage them (or write their own) to send and
receive HTTP messages. In the example project for this chapter, we will write some code to
interpret an incoming HTTP request message and send back an HTTP response message.
The primary Rust Standard Library module for networking communications is std::net.
This focuses on writing code for communicating using TCP and UDP. The Rust std::net
module does not deal directly with the data link layer or application layer of the TCP/
IP protocol suite. With this background, we are ready to understand the networking
primitives provided in the Rust standard library for TCP and UDP communications.
314 Learning Network Programming
• Ipv4Addr: This is a struct that stores a 32-bit integer representing an IPv4 address,
and provides associated functions and methods to set and query address values.
• Ipv6Add: This is a struct that stores a 128-bit integer representing an IPv6 address,
and provides associated functions and methods to query and set address values.
• SocketAddrv4: This is a struct representing an internet domain socket. It stores
an IPv4 address and a 16-bit port number and provides associated functions and
methods to set and query socket values.
• SocketAddrv6: This is a struct representing an internet domain socket. It stores
an IPv6 address and a 16-bit port number and provides associated functions and
methods to set and query socket values.
• IpAddr: This is an enum with two variants – V4(Ipv4Addr) and
V6(Ipv6Addr). This means that it can hold either an IPv4 host address or an IPv6
host address.
Understanding networking primitives in the Rust standard library 315
Note
The size of an Ipv6 address might vary, depending on the target operating
system architecture.
Let's now see a few examples of how to use them. We'll start by creating IPv4 and IPv6
addresses.
In the example shown next, we're creating IPv4 and IPv6 addresses using the std::net
module and using built-in methods to query on the created addresses. The is_
loopback() method confirms whether the address corresponds to localhost, and
the segments() method returns the various segments of the IP address. Note also that
the std::net module provides a special constant, Ipv4Addr::LOCALHOST, which
can be used to initialize the IP address with the localhost (loopback) address:
fn main() {
// Create a new IPv4 address with four 8-bit integers
let ip_v4_addr1 = Ipv4Addr::new(106, 201, 34, 209);
// Use the built-in constant to create a new loopback
// (localhost) address
let ip_v4_addr2 = Ipv4Addr::LOCALHOST;
println!(
"Is ip_v4_addr1 a loopback address? {}",
ip_v4_addr1.is_loopback()
);
println!(
"Is ip_v4_addr2 a loopback address? {}",
ip_v4_addr2.is_loopback()
);
//Create a new IPv6 address with eight 16-bit
// integers, represented in hex
let ip_v6_addr = Ipv6Addr::new(2001, 0000, 3238,
0xDFE1, 0063, 0000, 0000, 0xFEFB);
316 Learning Network Programming
The following example shows how to use the IpAddr enum. In this example, usage of the
IpAddr enum is shown to create IPv4 and IPv6 addresses. The IpAddr enum helps us to
define IP addresses in a more generic way in our program data structures and gives us the
flexibility to work with both IPv4 and IPv6 addresses in our programs:
fn main() {
// Create an ipv4 address
let ip_v4_addr = IpAddr::V4(Ipv4Addr::new(106, 201, 34,
209));
// check if an address is ipv4 or ipv6 address
println!("Is ip_v4_addr an ipv4 address? {}",
ip_v4_addr.is_ipv4());
println!("Is ip_v4_addr an ipv6 address? {}",
ip_v4_addr.is_ipv6());
Let's now turn our attention to sockets. As discussed earlier, sockets comprise an IP
address and a port. Rust has separate data structures for both IPv4 and IPv6 sockets.
Let's see an example next. Here, we're creating a new IPv4 socket, and querying for the
IP address and port numbers from the constructed socket, using the ip() and port()
methods, respectively:
[Link](), [Link]());
println!("Is this IPv6 socket?{}",socket.is_ipv6());
}
IP addresses and sockets represent the foundational data structures for network
programming using the Rust standard library. In the next section, we'll see how to write
programs in Rust that can communicate over TCP and UDP protocols.
tcpudp/src/bin/[Link]
use std::str;
use std::thread;
fn main() {
let socket = UdpSocket::bind("[Link]:3000").expect(
318 Learning Network Programming
tcpudp/src/bin/[Link]
use std::net::UdpSocket;
fn main() {
// Create a local UDP socket
let socket = UdpSocket::bind("[Link]:0").expect(
"Unable to bind to socket");
// Connect the socket to a remote socket
socket
.connect("[Link]:3000")
.expect("Could not connect to UDP server");
println!("socket peer addr is {:?}",
socket.peer_addr());
// Send a datagram to the remote socket
socket
.send("Hello: sent using send() call".as_bytes())
.expect("Unable to send bytes");
}
From a separate terminal, run the UDP client with the following:
You'll see the message received at the server, which was sent from the client.
We've seen so far how to write programs in Rust to do communications over UDP. Let's
now look at how TCP communications are done.
320 Learning Network Programming
tcpudp/src/bin/[Link]
This concludes the code for the TCP server. Let's now write a TCP client to send some
data to the TCP server.
In the TCP client code shown next, we're using the TcpStream::connect function
to connect to a remote socket where the server is listening. This function returns a TCP
stream, which can be read from and written to (as we saw in the previous example). Here,
we're first going to write some data to the TCP stream, and then read back the response
received from the server:
Programming with TCP and UDP in Rust 321
tcpudp/src/bin/[Link]
From a separate terminal, run the TCP client with the following:
You'll see the message that was sent from the client being received at the server and
echoed back.
This concludes this section on performing TCP and UDP communications using the Rust
standard library. In the next section, let's use the concepts learned so far to build a TCP
reverse proxy.
322 Learning Network Programming
Forward proxies act as gateways to the internet for a group of client machines. They help
individual client machines to hide their IP addresses while browsing the internet. They
also help to enforce organizational policies for machines within a network to access the
internet, such as restricting websites to visit.
While a forward proxy acts on behalf of clients, a reverse proxy acts on behalf of hosts
(for example, web servers). They hide the identity of the backend servers from the clients.
The clients only make a request to the reverse proxy server address/domain, and the
reverse proxy server, in turn, knows how to route that request to the backend server
(also sometimes called the origin server), and returns the response received from the
origin server to the requesting client. A reverse proxy can also be used to perform other
functions, such as load balancing, caching, and compression. We will, however, just focus
on demonstrating the core concept of a reverse proxy by directing requests received from
clients to the backend origin servers and routing responses back to the requesting client.
To demonstrate a working reverse proxy, we will build two servers:
tcpproxy/src/bin/[Link]
Next, let's declare a struct to hold the incoming HTTP request line (the first line of the
multi-line HTTP request message). We'll also write some helper methods for this struct.
In the code shown next, we'll declare a RequestLine struct consisting of three fields –
the HTTP method, the path of the resource requested, and the HTTP protocol version
supported by the internet browser or another HTTP client sending the request. We'll also
write some methods to return the values of the struct members. Custom logic will be
implemented for the get_order_number() method. If we get a request for a resource
with the /order/status/1 path, we will split this string by /, and return the last part
of the string, which is order number 1:
tcpproxy/src/bin/[Link]
#[derive(Debug)]
struct RequestLine {
method: Option<String>,
path: Option<String>,
protocol: Option<String>,
}
Writing a TCP reverse proxy (project) 325
impl RequestLine {
fn method(&self) -> String {
if let Some(method) = &[Link] {
method.to_string()
} else {
String::from("")
}
}
fn path(&self) -> String {
if let Some(path) = &[Link] {
path.to_string()
} else {
String::from("")
}
}
fn get_order_number(&self) -> String {
let path = [Link]();
let path_tokens: Vec<String> = [Link]("/").map(
|s| [Link]().unwrap()).collect();
path_tokens[path_tokens.len() - 1].clone()
}
}
Let's also implement the FromStr trait for the RequestLine struct so that we can
convert the incoming HTTP request line (string) into our internal Rust data structure –
RequestLine. The structure of the HTTP request line is shown here:
These three values are separated by white spaces and are all present in the first line of an
HTTP request message. In the program shown, we're going to parse these three values and
load them into the RequestLine struct. Later, we will further parse the path member
and extract the order number from it, for processing:
tcpproxy/src/bin/[Link]
Ok(Self {
method: method,
path: path,
protocol: protocol,
})
}
}
We've so far seen the module imports, struct definition, and methods for the
RequestLine struct. Let's now write the main() function.
1. Read the first line of the incoming HTTP request message and convert it into a
RequestLine struct.
2. Construct the HTTP response message and write it to the TCP stream.
Let's now see the code for the main function in two parts – starting the TCP server and
listening for connections, and processing incoming HTTP requests.
tcpproxy/src/bin/[Link]
Then, we'll listen for incoming connections, and read from the stream for each
connection:
tcpproxy/src/bin/[Link]
tcpproxy/src/bin/[Link]
Now that we have parsed the required data into the RequestLine struct, we can process
it and send the HTTP response back. Let's see the code. If the message received is not a
GET request, if the path in the request message does not start with /order/status, or
if the order number is not provided, construct an HTTP response message with the 404
Not found HTTP status code:
tcpproxy/src/bin/[Link]
if req_line.method() != "GET"
|| !req_line.path().starts_with(
"/order/status")
|| req_line.get_order_number().len() == 0
{
if req_line.get_order_number().len() == 0 {
order_status = format!("Please provide
valid order number");
} else {
order_status = format!("Sorry,this page is
not found");
}
html_response_string = format!(
"HTTP/1.1 404 Not Found\nContent-Type:
text/html\nContent-Length:{}\n\n{}",
order_status.len(),
order_status
);
}
If the request is correctly formatted to retrieve the order status for an order number, we
should construct an HTML response message with the 200 OK HTTP status code for
sending the response back to the client:
tcpproxy/src/bin/[Link]
else {
order_status = format!(
"Order status for order number {} is:
Shipped\n",
req_line.get_order_number()
);
html_response_string = format!(
330 Learning Network Programming
Lastly, let's write the constructed HTTP response message to the TCP stream:
tcpproxy/src/bin/[Link]
[Link](html_response_string.as_bytes()).unwrap();
This concludes the code for the origin server. The complete code can be found in the Packt
GitHub repo for Chapter12 at tcpproxy/src/bin/[Link].
Run the program with the following:
You should see the server start with the following message:
Running on port: 3000
In a browser window, enter the following URL:
localhost:3000/order/status/2
You should see the following response displayed on the browser screen:
Order status for order number 2 is: Shipped
Try entering a URL with an invalid path, such as the following:
localhost:3000/invalid/path
localhost:3000/order/status/
Writing a TCP reverse proxy (project) 331
tcpproxy/src/bin/[Link]
use std::env;
use std::io::{Read, Write};
use std::net::{TcpListener, TcpStream};
use std::process::exit;
use std::thread;
Let's write the main() function next. When we start the reverse proxy server, let's accept
two command-line parameters, corresponding to socket addresses of the reverse proxy and
origin server, respectively. If two command-line parameters are not provided by the user,
then print out an error message and exit the program. Then, let's parse the command-line
inputs and start the server using TcpListener::bind. After binding to the local port,
we connect to the origin server and print out an error message in the case of failure to
connect.
Place the following code within the main() function block:
tcpproxy/src/bin/[Link]
if [Link]() < 3 {
eprintln!("Please provide proxy-from and proxy-to
addresses");
exit(2);
}
let proxy_server = &args[1];
let origin_server = &args[2];
// Start a socket server on proxy_stream
let proxy_listener;
if let Ok(proxy) = TcpListener::bind(proxy_server) {
proxy_listener = proxy;
let addr = proxy_listener.local_addr()
.unwrap().ip();
let port = proxy_listener.local_addr().unwrap()
.port();
if let Err(_err) = TcpStream::connect(
origin_server) {
println!("Please re-start the origin server");
exit(1);
}
println!("Running on Addr:{}, Port:{}\n", addr,
port);
} else {
eprintln!("Unable to bind to specified proxy
port");
exit(1);
}
After starting the server, we must listen for incoming connections. For every connection,
spawn a separate thread to handle the connection. The thread in turn calls the handle_
connection() function, which we will describe shortly. Then, join the child thread
handles with the main thread to make sure that the main() function does not exit before
the child threads are completed:
Writing a TCP reverse proxy (project) 333
tcpproxy/src/bin/[Link]
This concludes the main() function. Let's now write the code for handle_
function(). This contains the core logic for proxying to the origin server:
tcpproxy/src/bin/[Link]
For ease of debugging, the four key steps involved in the proxy functionality are marked in
the code and also printed out to the console:
1. In the first step, we read the incoming data from the incoming client connection.
2. In the second step, we open a new TCP stream with the origin server, and send the
data we received from the client to the origin server.
3. In the third step, we are reading the response we received from the origin server and
store the data in a buffer.
4. In the final step, we are using the data received in the previous step to write to the
TCP stream corresponding to the client that sent the original request.
This concludes the code for reverse proxy. We've kept the functionality simple and handled
only the base case. As an extra exercise, you can add edge cases to make the server more
robust, and also add additional functionality such as load-balancing and caching.
This concludes the code for the origin server. The complete code can be found in the Packt
GitHub repo for Chapter12 at tcpproxy/src/bin/[Link].
Writing a TCP reverse proxy (project) 335
The first command-line parameter that we pass is used by the reverse proxy server to bind
to the specified socket address. The second command-line parameter corresponds to the
socket address at which the origin server is running. This is the address to which we have
to proxy the incoming requests.
Let's now run the same tests from a browser that we did for the origin server, only this
time we'll send the request to port 3001, where the reverse proxy server is running. You'll
notice that you will get similar response messages. This demonstrates that the requests
sent by the internet browser client are being proxied by the reverse proxy server to the
backend origin server, and the response received from the origin server is being routed
back to the browser client.
You should see the server start with the following message:
Running on Addr:[Link], Port:3001
In a browser window, enter the following URL:
localhost:3001/order/status/2
You should see the following response displayed on the browser screen:
Order status for order number 2 is: Shipped
Try entering a URL with an invalid path, such as the following:
localhost:3001/invalid/path
localhost:3001/order/status/
336 Learning Network Programming
Summary
In this chapter, we reviewed the basics of networking in Linux/Unix. We learned about the
networking primitives in the Rust standard library, including data structures for IPv4 and
IPv6 addresses, IPv4 and IPv6 sockets, and associated methods. We learned how to create
addresses, as well as create sockets and query them.
We then learned how to use UDP sockets and wrote a UDP client and server. We also
reviewed the TCP communication basics, including how to configure TCP listeners, how
to create a TCP socket server, and how to send and receive data. Lastly, we wrote a project
consisting of two servers – an origin server and a reverse proxy server that routes requests
to the origin server.
In the next and final chapter of the book, we'll cover another important topic for system
programming – unsafe Rust and FFI.
12
Writing Unsafe
Rust and FFI
In the previous chapter, we learned about the network primitives built into the Rust
Standard Library and saw how to write programs that communicate over TCP and UDP.
In this chapter, we will conclude the book by covering a few advanced topics related to
unsafe Rust and foreign function interfaces (FFIs).
We have seen how the Rust compiler enforces rules of ownership for memory and thread
safety. While this is a blessing most of the time, there may be situations when you want to
implement a new low-level data structure or call out to external programs written in other
languages. Or, you may want to perform other operations prohibited by the Rust compiler,
such as dereferencing raw pointers, mutating static variables, or dealing with uninitialized
memory. Have you wondered how the Rust Standard Library itself makes system calls to
manage resources, when system calls involve dealing with raw pointers? The answer lies in
understanding unsafe Rust and FFIs.
In this chapter, we'll first look at why and how Rust code bases use unsafe Rust code.
Then, we'll cover the basics of FFIs and talk about special considerations while working
with them. We'll also write Rust code that calls a C function, and a C program that calls a
Rust function.
338 Writing Unsafe Rust and FFI
By the end of this chapter, you will have learned when and how to use unsafe Rust. You
will learn how to interface Rust to other programming languages, through FFIs, and learn
how to work with them. You'll also get an overview of a few advanced topics, such as
application binary interfaces (ABIs), conditional compilation, data layout conventions,
and providing instructions to the linker. Understanding these will be helpful when
building Rust binaries for different target platforms, and for linking Rust code with code
written in other programming languages.
Technical requirements
Verify that rustup, rustc, and cargo have been installed correctly with the
following command:
rustup --version
rustc --version
cargo --version
Since this chapter involves compiling C code and generating a binary, you will need to set
up the C development environment on your development machine. After setup, run the
following command to verify that the installation is successful:
gcc --version
If this command does not execute successfully, please revisit your installation.
Note
It is recommended that those developing on a Windows platform use a Linux
virtual machine to try out the code in this chapter.
The code in this section has been tested on Ubuntu 20.04 (LTS) x64 and should
work on any other Linux variant.
Introducing unsafe Rust 339
The Git repo for the code in this chapter can be found at [Link]
com/PacktPublishing/Practical-System-Programming-for-Rust-
Developers/tree/master/Chapter12.
fn main() {
let num = 23;
340 Writing Unsafe Rust and FFI
Compile this code with cargo check (or run it from Rust playground IDE). You'll see
the following error message:
Let's now modify the code by enclosing the dereferencing of the raw pointer within an
unsafe block:
fn main() {
let num = 23;
let borrowed_num = # // immutable reference to num
let raw_ptr = borrowed_num as *const i32; // cast
// reference borrowed_num to raw pointer
unsafe {
assert!(*raw_ptr == 23);
}
}
You will see that the compilation is successful now, even though this code can potentially
cause undefined behavior. This is because, once you enclose some code within an unsafe
block, the compiler expects the programmer to ensure the safety of unsafe code.
Let's now look at the kind of operations unsafe Rust enables.
We'll look at the first three in this section and the last two in the next section:
• You can dereference a raw pointer: Unsafe Rust has two new types called raw
pointers – *const T is a pointer type that corresponds to &T (immutable
reference type) in safe Rust, and *mut T is a pointer type that corresponds to
&mut T (mutable reference type in safe Rust). Unlike Rust reference types, these
raw pointers can have both immutable and mutable pointers to a value at the same
time or have multiple pointers simultaneously to the same value in memory. There
is no automatic cleanup of memory when these pointers go out of scope, and these
pointers can be null or refer to invalid memory locations too. The guarantees
provided by Rust for memory safety do not apply to these pointer types. Examples
of how to define and access pointers in an unsafe block are shown next:
fn main() {
let mut a_number = 5;
// Create an immutable pointer to the value 5
let raw_ptr1 = &a_number as *const i32;
// Create a mutable pointer to the value 5
let raw_ptr2 = &mut a_number as *mut i32;
unsafe {
println!("raw_ptr1 is: {}", *raw_ptr1);
println!("raw_ptr2 is: {}", *raw_ptr2);
}
}
You'll note from this code that we've simultaneously created both an immutable
reference and a mutable reference to the same value, by casting from the
corresponding immutable and mutable reference types. Note that to create the raw
pointers, we do not need an unsafe block, but only for dereferencing them. This
is because dereferencing a raw pointer may result in unpredictable behavior as the
borrow checker does not take responsibility for verifying its validity or lifetime.
342 Writing Unsafe Rust and FFI
This code snippet shows the declaration of a mutable static variable, THREAD_COUNT,
initialized to 4. When the main() function executes, it looks for an environmental
variable with the name THREAD_COUNT. If the env variable is found, it calls the
change_thread_count() function, which mutates the value of the static
variable in an unsafe block. The main() function then prints out the value in an
unsafe block.
Introducing FFIs 343
• Implementing an unsafe trait: Let's try to understand this with an example. Let's say
we have a custom struct containing a raw pointer that we want to send or share across
threads. Recall from Chapter 9, Managing Concurrency, that for a type to be sent or
shared across threads, it needs to implement the Send or Sync traits. To implement
these two traits for the raw pointer, we have to use unsafe Rust, as shown:
struct MyStruct(*mut u16);
unsafe impl Send for MyStruct {}
unsafe impl Sync for MyStruct {}
The reason for the unsafe keyword is because raw pointers have untracked
ownership, which then becomes the responsibility of the programmer to track
and manage.
There are two more features of unsafe Rust that are related to interfacing with other
programming languages, which we will discuss in the next section on FFIs.
Introducing FFIs
In this section, we'll understand what FFI is, and then see the two unsafe Rust features
related to FFI.
To understand FFI, let's look at the following two examples:
While there may be other ways to solve this problem, one popular method is to use FFI.
In the first example, you can wrap the Rust library with an FFI defined in Java or Python.
In the second example, Rust has a keyword, extern, with which an FFI to a C function
can be set up and called. Let's see an example of the second case next:
fn main() {
let c1 = CString::new("MY_VAR").expect("Error");
unsafe {
println!("env got is {:?}", CStr::from_ptr(getenv(
c1.as_ptr())));
}
}
Here, in the main() function, we are invoking the getenv() external C function
(instead of directly using the Rust Standard Library) to retrieve the value of the MY_VAR
environment variable. The getenv() function accepts a *const c_char type
parameter as input. To create this type, we are first instantiating the CString type,
passing in the name of the environment variable, and then converting it into the required
function input parameter type using the as_ptr() method. The getenv() function
returns a *mut c_char type. To convert this into a Rust-compatible type, we are using
the Cstr::from_ptr() function.
Note the two main considerations here:
• We are specifying the call to the C function within an extern "C" block. This
block contains the signature of the function that we want to call. Note that the data
types in the function are not Rust data types, but those that belong to C.
• We are importing a couple of modules – std::ffi and std::os::raw – from
the Rust Standard Library. The ffi module provides utility functions and data
structures related to FFI bindings, which makes it easier to do data mapping across
non-Rust interfaces. We are using the CString and CStr types from the ffi
module, to transfer UTF-8 strings to and from C. The os::raw module contains
platform-specific types that map to the C data types so that the Rust code that
interacts with C will refer to the correct types.
You'll see the value of MY_VAR printed out to the console. With this, we have successfully
retrieved the value of an environment variable using a call to an external C function.
Recall that we learned how to get and set environment variables in previous chapters
using the Rust Standard Library. Now we have done something similar, but this time using
the Rust FFI interface to invoke a C library function. Note that the call to the C function is
enclosed in an unsafe block.
Introducing FFIs 345
So far, we've seen how to invoke a C function from Rust. Later, in the Calling Rust from
C (project) section, we'll see how to do it the other way around, that is, invoke a Rust
function from C.
Let's now take a look at another feature of unsafe Rust, which is to define and access fields
of a union struct, for communicating with a C function across an FFI interface.
Unions are data structures used in C, and are not memory-safe. This is because in a union
type, you can set the instance of a union to one of the invariants and access it as another
invariant. Rust does not directly provide union as a type in safe Rust. Rust, however, has
a type of union called a tagged union, which is implemented as the enum data type in safe
Rust. Let's see an example of union:
#[repr(C)]
union MyUnion {
f1: u32,
f2: f32,
}
fn main() {
let float_num = MyUnion {f2: 2.0};
let f = unsafe { float_num.f2 };
println!("f is {:.3}",f);
}
In the code shown, we are first using a repr(C) annotation, which tells the compiler that
the order, size, and alignment of fields in the MyUnion union is what you would expect
in the C language (we'll discuss more about repr(C) in the Understanding the ABI
section). We're then defining two invariants of the union: one is an integer of type u32
and the other is a float of type f32. For any given instance of this union, only one of these
invariants is valid. In the code, we're creating an instance of this union, initializing it with
a float invariant, and then accessing its value from the unsafe block.
Run the program with the following:
cargo run
You'll see the value f is 2.000 printed to your terminal. So far, it looks right. Now, let's
try to access the union as an integer, instead of a float type. To do this, just alter one line of
code. Locate the following line:
Run the program again. This time, you won't get an error but you'll see an invalid value
printed like this. The reason is that the value in the memory location pointed to is now
being interpreted as an integer even though we had stored a float value:
f is 1073741824
Using unions in C is dangerous unless it is done with the utmost care, and Rust provides
the ability to work with unions as part of unsafe Rust.
So far, you've seen what unsafe Rust and FFI are. You've also seen examples of calling
unsafe and external functions. In the next section, we'll discuss guidelines for creating
safe FFI interfaces.
• The extern keyword: Any foreign function defined with an extern keyword in
Rust is inherently unsafe, and such calls must be done from an unsafe block.
• Data layout: Rust does not provide guarantees on how data is laid out in memory,
because it takes charge of allocations, reallocations, and deallocations. But when
working with other (foreign) languages, explicit use of a C-compatible layout (using
the #repr(C) annotation) is important to maintain memory safety. We've seen an
example earlier of how to use this. Another thing to note is that only C-compatible
types should be used as parameters or return values for external functions.
Examples of C-compatible types in Rust include integers, floats, repr(C)-
annotated structs, and pointers. Examples of Rust types incompatible with C
include trait objects, dynamically sized types, and enums with fields. There are tools
available such as rust-bindgen and cbindgen that can help in generating types
that are compatible between Rust and C (with some caveats).
Calling Rust from C (project) 347
This concludes the section on writing safe FFI interfaces. In the next section, we'll see an
example of using a Rust library from C code.
Here are the steps that we will go through to develop and test a working example of a C
program that calls a function from a Rust library using the FFI interface:
The #[no_mangle] annotation tells the Rust compiler that the see_ffi_in_
action() function should be accessible to external programs with the same name.
Otherwise, by default, the Rust compiler alters it.
Calling Rust from C (project) 349
The function uses the extern "C" keyword. As discussed earlier, the Rust
compiler makes any functions marked with extern compatible with C code.
The "C" keyword in extern "C" indicates the standard C calling convention
on the target platform. In this function, we are simply printing out a greeting.
4. Build the Rust shared library from the ffi folder with the following command:
cargo build --release
If the build completes successfully, you'll see a shared library with the name
[Link], created in the target/release directory.
5. Verify whether the shared library has been built correctly:
nm -D target/release/[Link] | grep see_ffi_in_
action
If you don't see something similar, the shared library may not have been built
correctly. Please revisit the previous steps. (Note that the shared library is created
with a .dylib extension on the Mac platform.)
6. Let's create a C program that invokes the function from the Rust shared library that
we have built. Create a rustffi.c file in the root of the ffi project folder and
add the following code:
#include "rustffi.h"
int main(void) {
see_ffi_in_action();
}
This is a simple C program that includes a header file and has a main() function
that in turn invokes a see_ffi_in_action() function. At this point, the
C program does not know where this function is located. We'll provide this
information to the C compiler when we build the binary. Let's now write the header
file that's referred to in this program. Create a rustffi.h file in the same folder as
the C source file, and include the following:
void see_ffi_in_action();
350 Writing Unsafe Rust and FFI
This header file declares the function signature, which denotes that this function
does not return any value or take any input parameter.
7. Build the C binary with the following command, from the root folder of the project:
gcc rustffi.c -Ltarget/release -lffitest -o ffitest
More details about the various conditional compilation options can be found at
[Link]
[Link].
• Data layout conventions: Apart from the platform and operating system
considerations, data layout is another aspect that is important to understand,
especially while transferring data across FFI boundaries.
In Rust, as in other languages, type, alignment, and offsets are associated with its
data elements. For example, say you declare a struct of the following type:
struct MyStruct {
member1: u16,
member2: u8,
member3: u32,
}
This is done in order to reconcile the differences in the integer sizes with the
processor word size. The idea is that the whole struct will have a size that's a
multiple of 32 bits, and there may be multiple layout options to achieve this. This
internal layout for Rust data structures can also be annotated as #[repr(Rust)].
But if there is data that needs to pass through an FFI boundary, the accepted
standard is to use the data layout of C (annotated as #[repr(C)] ). In this layout,
the order, size, and alignment of fields are as it is done in C programs. This is
important to ensure the compatibility of data across the FFI boundary.
Rust guarantees that if the #[repr(C)] attribute is applied to a struct, the layout
of the struct will be compatible with the platform's representation in C. There are
automated tools, such as cbindgen, that can help generate the C data layout from
Rust programs.
• Link options: The third aspect we will cover regarding calling functions from other
binaries is the link annotation. Take the following example:
#[link(name = "my_library")]
extern {
static a_c_function() -> c_int;
}
The #[link(...)] attribute is used to instruct the linker to link against my_
library in order to resolve the symbols. It instructs the Rust compiler how to
link to native libraries. This annotation can also be used to specify the kind of
library to link to (static or dynamic). The following annotation tells rustc to link to
a static library with the name my_other_library:
#[link(name = "my_other_library", kind = "static")]
In this section, we've seen what an ABI is and its significance. We've also looked at how to
specify instructions to the compiler and linker through various annotations in code, for
aspects such as the target platform, operating system, data layout, and link instructions.
This concludes this section. The intent of this section was only to introduce a few
advanced topics related to the ABI, FFI, and associated instructions to the compiler and
linker. For more details, refer to the following link: [Link]
nomicon/.
354 Writing Unsafe Rust and FFI
Summary
In this chapter, we reviewed the basics of unsafe Rust and understood the key differences
between safe and unsafe Rust. We saw how unsafe Rust enables us to perform operations
that would not be allowed in safe Rust, such as dereferencing raw pointers, accessing or
mutating static variables, working with unions, implementing unsafe traits, and calling
external functions. We also looked at what a foreign function interface is, and how to
write one in Rust. We wrote an example of invoking a C function from Rust. Also, in the
example project, we wrote a Rust shared library and invoked it from a C program. We saw
guidelines for how to write safe FFIs in Rust. We took a look at the ABI and annotations
that can be used to specify conditional compilation, data layout, and link options.
With this, we conclude this chapter, and also this book.
I thank you for joining me on this journey into the world of system programming with
Rust, and wish you the very best with exploring the topic further.
Other Books You
May Enjoy
If you enjoyed this book, you may be interested in these other books by Packt:
crates Django 85
about 7, 15 documentation
inline documentation comments, writing, in markdown files 26
writing 24, 25 documentation comments 25
documentation tests
D running 27
domain name system (DNS) 311
data layout conventions 352, 353 dynamic data structure
data segment 146 about 166
dependencies coding 165-170
about 13 implementing 162
specifying, for cargo package 13 dynamic libraries
dependencies, specifying building, with Cargo 15
reference link 17 dynamic lifetime 148
dependency location dynamic memory 148
specifying 16 dynamic size 148
dependency location specification, ways
alternative registry 16
[Link] registry 16
E
Git repository 16 elision 39
local path, specifying 16 environment variables 146
multiple locations 17 error handling
dependency management in std::io module 297, 298
automating 15, 16 in threads 267-269
dependent packages errors
using, in source code 17 dealing with 58-62
derivable traits 90 handling, in processes 243
device drivers evaluator
about 287-289 building 56
need for 289 executables 7
device I/O fundamentals
in Linux 287
device types, Unix/Linux
F
block devices 289 features for ABI, by Rust
character devices 288 conditional compilation options 351
network devices 289 data layout conventions 352, 353
directory 183-186 link options 353
362
file 174
file descriptor (fd) 175, 176
H
file I/O hard links
in Rust 177, 179 about 187
file operations creating, with Rust 187
Linux system calls, using for 174-177 setting 187
file operations, with Rust heap
copy operation 180 about 146
create operation 179 characteristics 150
open operation 179 high-level programming languages
query operation 181 versus low-level programming
read operation 180, 181 languages 154
rename operation 180 HTML template engine
write operation 181 building 83-85
filesystem 175 data structures 89, 90
foreign function interfaces design 89
(FFIs) 78, 343-346 executing 103
forward proxy 322 key functions 91-93
functional requirements, for Cargo project main() function design 99, 102
command-line tool, building 107, 109 parser, writing 95
supporting functions 96-98
G syntax 85-88
tests, running 103
garbage collection 160 types 85
garbage collector (GC) 143, 160 writing 94
green threads 264 HTTP request line 323
guidelines, for safe FFIs
C's pointer types, versus Rust's
reference types 347
I
data layout 346 image resizing tool, creating with
extern keyword 346 Rust Standard Library
memory management 347 about 113
panic, handling 347 command-line parameters,
platform-dependent types 347 working with 118, 119
reviewing 346 directory iteration 113-115
Rust library, exposing to environment variables,
foreign language 347 working with 116, 117
363
In Rust, testing strategies involve using the #[cfg(test)] attribute to create a module for test cases and the #[test] attribute for individual tests. These tests validate the behavior of functions like get_content_type(), ensuring templates are correctly tokenized as Literal, TemplateVariable, or Tag. This approach, along with assertions, confirms correctness and reliability, which is crucial in iterative development for identifying and fixing issues early .
The ImageCLI tool handles command-line input through three parameters: size, mode, and source folder. It supports resizing images to small, medium, or large sizes and can process either a single image or all images in a folder. The tool is structured to allow user input to control the resizing process and provide statistics on file quantity and total size. The core functionality is separate from the CLI, allowing for future integration into other interfaces .
Derivable traits in Rust are traits that the compiler can automatically implement for custom structs and enums. Traits like Eq, PartialEq, and Debug can be derived using the #[derive] attribute. In the template engine, this allows easy comparison and printing of values. For example, the ContentType enum and ExpressionData struct use #[derive(PartialEq, Debug)] to enable these functionalities, making it easier to develop and test the template engine .
Rust uses the std::process module to spawn and terminate processes. Processes can be spawned using spawn() for asynchronous behavior or output() for synchronous execution. Processes can be terminated using abort() or exit(), where exit() allows the specification of an exit code. These codes communicate success or failure to the calling process, enhancing error handling by providing specific error codes denoting failure conditions .
The memory management lifecycle of a Rust program involves several components including allocation, use and manipulation, deallocation/release, and tracking usage. Allocation is either explicit in low-level languages or automatic in high-level ones. Use and manipulation include defining memory areas, initializing variables, modifying values, and reference creation. Deallocation is explicit in low-level languages but automatic in languages with garbage collectors. Tracking usage involves the kernel keeping track of allocations and releases. Virtual memory management by the OS ensures processes are insulated from physical memory limitations by providing a virtual address space .
The Rust template engine processes and classifies template strings using the get_content_type() function, which determines if a line is a Literal, TemplateVariable, Tag, or Unrecognized. Tags can be for-tags or if-tags, and TemplateVariables are extracted into components like head, variable, and tail using the ExpressionData struct. The classification and parsing are defined within the ContentType enum .
Rust's trait system allows for code reuse and type safety by enabling polymorphism and interfaces. In the context of the template engine, traits like Debug and PartialEq let programmers derive functionality such as printing and comparing enum variants without implementing from scratch. This aligns with Rust's goal of safety as developers can leverage tested, language-provided abstractions, reducing boilerplate and error potential. Traits ensure generic functions operate on a defined interface, increasing flexibility while maintaining strict type contracts .
Rust manages virtual memory and memory layout to ensure safety and performance through its ownership system and compiler checks. Stack allocations ensure quick data access and deallocation on function exit, while heap allocations are managed through Box and Rc smart pointers. Static memory is used for immutable global data. Rust prevents misuse by not allowing multiple mutable references and by checking lifetimes at compile time. This balance helps prevent memory allocation errors common in other low-level languages .
Rust efficiently handles process I/O through the std::process module using pipe-based communication (Stdio::piped) between processes, allowing input/output redirection. Command execution can involve collecting standard output using Command's stdout() method combined with I/O handling methods like read_to_string. This integration helps manage process interactions and data flow pragmatically and is essential for tasks involving parallel execution or inter-process communication .
Separating CLI functionality from core library logic, as seen in projects like ImageCLI, allows for more flexible and maintainable code. The core logic can be easily reused or extended for other interfaces like GUIs or web services. It also facilitates testing by isolating user interactions from algorithmic logic, leading to cleaner and more focused test cases specific to functionality or interface. This modular design encourages collaboration and parallel development of components .