A router matches requests against registered routes and invokes the associated
request handler to return a response. There are many routers available, but
they are often tied to a whole framework. Instead, this article explains how to
handle HTTP requests with Hyper, how to route requests with
Rust pattern matching, and how to handle query parameters, forms, cookies, and
more. It assumes you are familiar with the basics of Linux, HTTP, and Rust. For
full code samples, see the source
files.
The target of an HTTP request corresponds to a resource (e.g., a list of tasks
for /tasks, the first one for /tasks/1). A method defines an action on a
particular resource (e.g., POST /tasks to create a new task, GET /tasks/1
to view the first one). When a resource does not exist, or the method for that
resource is not allowed, the server returns an error to the client.
The method defines the semantics of a request for a particular resource. A
handler associated with a resource receives all the requests for this target,
regardless of the method. Therefore, it is the responsibility of the handler to
allow or deny specific methods, but as an additional guard, the server can deny
methods it does not implement before calling any handler.
A handler receiving a request with a forbidden method returns 405 Method Not Allowed with the Allow header to indicate the list of valid methods for that
resource:
Update index to restrict methods other than GET and HEAD:
A POST request for / returns 405 Method Not Allowed:
When the target is unknown, the server returns 404 Not Found, which is an
example of a client error. On the contrary, being unable to connect to the
database is an example of server error, and it returns 500 Internal Server Error to the client:
In a resource handler, you may have to parse request parameters, query a
database, render templates, all of which can produce errors. Therefore, the
request handlers return a Result<T, E>, where T is a Response. Client
errors are regular responses with the Ok variant, but server errors use the
Err variant.
Define a generic Result<T> type for all the handlers (you can also use a type
defined in crates such as anyhow):
Then, update index to return this type:
Create the handler route, responsible for routing the requests as handle
did previously, but instead returning a Result:
handle has to return a Response for each request, so any error from route
needs to be turned into a 500 Internal Server Error response:
A request for / returns 200 OK:
A request for /error returns 500 Internal Server Error.
You may have to repeat some operations in multiple handlers. 405 Method Not Allowed responses must have an Allow header, so you also have to return the
list of methods in the pattern (unless you can tolerate a slight deviation from
RFC7231).
To reduce code duplication and automatically generate the list of allowed
methods, you can create a declarative
macro:
Update index to make use of it:
With the method POST for /, the server returns 405 Method Not Allowed
including the list of allowed methods:
Procedural macros
may provide a more efficient implementation, which I leave as an exercise for
the reader...
A route associates an HTTP target with a resource. Up until now, route relies
on a naive string comparison with the target. By leveraging Rust pattern
matching, it is possible to match and extract parameters from dynamic routes.
A URL is formed of / separated components. A simple way to match a URL is to
split it into segments. For instance, /hello/world is "equivalent" to the
segments ["hello", "world"].
First, add a new function to build a 200 OK response with an UTF-8 encoded
plain text body:
Add the handler hello, that returns a custom "Hello, World!" based on its
argument name:
Update route to accept segments as a parameter and match against them:
Update handle to split the path into segments (dropping empty components
between duplicate /):
It is impossible to distinguish /hello and /hello/ from the segments alone,
but you will see how to handle this situation in § Handlers § Trailing
slashes.
Akin to filesystem paths, URL paths can contain . and .. components, such
that /world/../hello/. is equivalent to /hello. Transforming a path into
its shortest equivalent, by eliminating these components, is called
normalization. Behind a reverse proxy and with well-behaved clients, you may
already receive normalized URLs, otherwise, you can add normalization to
handle:
As opposed to static URLs, dynamic URLs contain parameters. For example,
/tasks/3 and /tasks/42 correspond to the same pattern /tasks/<id>, where
the id segment is dynamic. These parameters must be extracted from a routing
handler, and passed to a resource handler as arguments.
You can also bind multiple segments (forming a sub-slice) to a variable with
the .. placeholder (matching zero or more elements). To extract a path, you
can join these segments with /:
The routing handlers extract segments from the target and pass them as a string
to the resource handlers. Then, it is the responsibility of the resource
handler to parse these parameters into their expected type.
Since both /hello and /hello/ correspond to the same segment hello, a
request to any of these targets is routed to the hello handler:
Although these two URLs point to the same resource, they are not strictly
identical. The page is duplicated, which is undesirable from an SEO
perspective. Additionally, it would be extremely surprising if /hello and
/hello/ pointed to a different resource. Therefore, each handler should
enforce a canonical URL with or without a trailing /.
From an historical point of view, a trailing / indicates a directory, whereas
no trailing / indicates a file. For directories, the server returns the
content of ./index.html. For example, a static website might have the
following layout:
Each folder encapsulates a piece of content. From
/website/posts/my-first-post/, you can link to the image easily with
./image.jpg.
Dynamic web servers can return whatever they want for any URL, but I try to
follow these rules:
For static assets, no trailing slashes.
For HTML pages, trailing slashes.
For API endpoints, no trailing slashes.
Feel free to follow any rules you want, as long as you stay consistent. To
redirect the client, use a 308 Permanent Redirect and a Location header for
the target URL:
If you want to enforce no trailing slashes, you can redirect the client when
the path ends with /:
Now, /hello/ redirects permanently to /hello:
You can handle the redirection in a routing handler (or even the reverse proxy)
if all the resource handlers use the same convention, but if at some point you
want to serve something that resembles a file, like an Atom feed, then it
becomes an issue.
Finally, to reduce code duplication, you can use the following macros:
The URL segments are string slices, so you have to parse them into their
expected type. For an invalid request, the server returns 400 Bad Request:
In the following example, the parameter is parsed as an usize that indicates
the language id for the response. If the parameter is not a number, the handler
returns 400 Bad Request, if it is out-of-range, it returns 404 Not Found:
Update route to pass the id to hello:
With 3, it works as expected:
With 42, it returns 404 Not Found, since the index is out-of-range:
Finally, if the parameter contains characters other than digits, the server
returns 400 Bad Request with an error message:
Besides the method and the target, a request can contain URL encoded query
parameters after the delimiter ?, such as the parameter lang with the value
en in /hello?lang=en.
These parameters are accessible with req.uri().query(), and you can parse
them as a key/value list with the crate form_urlencoded:
HTML forms commonly use the same URL encoding to send their data, except when
the method attribute on the form element is set to POST. In this case,
the parameters are submitted through the body of the request.
Reading from the body is an asynchronous operation, so you need to make
hello, route, and handle asynchronous by:
Replacing fn with async fn.
Calling them with .await.
Update hello to allow POST and extract the name parameter from the
request body (you need a mutable reference to the request for this operation
consumes the body):
Update route to pass a &mut Request:
In handle, mark the request as mutable and pass a mutable reference to
route:
With the method GET:
With the method POST, pass a name with the -d option:
With a router, middlewares are needed to perform operations before or after the
handler associated with a route is invoked. With an explicit routing approach,
you can perform any operation directly.
Behind a reverse proxy, the remote address corresponds to the IP address of the
proxy itself. The real client IP address can be transmitted in the header
X-Forwarded-For. You can extend the request to extract and save this address
in its extension map:
Update handle to use this extension:
Make a regular request:
Then, make another request with the header X-Forwarded-For, as if the server
was behind a reverse proxy:
A server can set cookies for a client with the header Set-Cookie. On each
subsequent request, the client sends back the cookies in the header Cookie.
The crate cookie manages cookies in a CookieJar: you can lookup their
values, add new ones, modify them, and most importantly, get the delta after
you made changes.
Extend Request to extract the original cookies from the headers and return a
CookieJar:
This example uses a cookie to perform authentication. It relies on a session
cookie named session_id with a special value for administrator sessions:
In practice, you would generate a new id for each session, and store it in a
database with the associated user, privileges, expiration time, etc. If a
client tries to access a restricted page without the appropriate permission,
the server responds with 403 Forbidden:
The admin handler authenticates the client by comparing the session cookie
value with the admin session id:
To login, the client must send a valid password to receive the admin session
cookie:
Update route to forward requests to admin and login:
Without a valid session id, the admin page returns 403 Forbidden:
To get the admin cookie, POST the password to the login endpoint (a web
browser would then save this cookie and follow the redirection):
In a real web application, the handlers would need to access external or shared
resources: the application settings, a database, a templating engine, a job
queue. These resources can be shared with the handlers through a context passed
along with the request.
To share the same data between multiple threads, you need to use an Arc<T>
pointer, where T is your Context. It contains a boolean setting to enable
debug mode and a Vec<String> that emulates a table in a database with a
single text column. To allow mutable access, place the Vec inside a
tokio::sync::Mutex:
For the method GET, the list handler returns the list of rows; for the
method POST, it adds a new row based on the form parameter text:
The details handler returns the text from the given row:
Update route to call these two resource handlers and pass a mutable request:
Update handle to pass a mutable request to route:
In main, the context is initialized, put inside an Arc, then it is cloned
and shared with the handlers:
If you decide to build a web server from scratch, I think Rust and Hyper are
solid options against micro-frameworks such as Flask. Hyper provides the
building blocks for asynchronous, safe, "low-level" HTTP handling, and Rust
pattern matching is powerful enough to replace a router. Starting from here,
you can add anything to the resource handlers. For example, this website relies
on the same techniques, with the addition of content and asset management, an
SQLite database, HTML templating, etc.