Notes on Python Starlette
Sun Aug 18 2024E.W.Ayers
Starlette is a Python library for writing HTTP servers. This is a set of notes on how it works. If you just want to use Starlette, the docs are very good. I wrote this becuase I found myself trawling the source code a lot.
ASGI
The first piece of the puzzle is ASGI. This is a protocol that the server process (ie Uvicorn) uses to communicate with the application process.
Transport
An ASGI Application is a Python callable with the signature:
ASGI Application signature.
type Value = bytes | str | int | float | list[Value] | dict[str, Value] | bool | None
type Scope = dict[str, Value]
type Event = dict[str, Value]
type ReceiveFn = Callable[[], Awaitable[Event]]
type SendFn = Callable[[Event], Awaitable[None]]
class ASGIApp(Protocol):
def __call__(
self,
scope: Scope,
receive: ReceiveFn,
send: SendFn,
) -> Awaitable[None]:
...
Each new connection the server recieves will invoke this app function.
The recieve and send functions amount to defining a transport for communicating with your connection peer.
Middleware means some wrapper for an asgi-app that is itself an asgi-app. Eg you could have a middleware that does auth.
Then on top of this, there are a load of standards for the contents passed in the various dictionaries. This is called the protocol.
The "type" value in scope defines the protocol.
Every Event dict must have a "type" key saying what kind of message it is.
Afaict, the type is always prefixed with the protocol name.
Http protocol
The scope value in the case of HTTP is given here.
This contains all of the information you need to handle the http request.
The only weird thing is the "state" key, which is to do with lifespans.
Using my made-up syntax for typed python dictionaries. These are non-exhaustive, but give a flavour of what the data looks like.
HTTP ASGI protocol dictionary types.
type HttpConnectionScope = {
"type": Literal["http"],
"scheme": Literal["http", "https"],
"path": str,
"""HTTP request target excluding any query string"""
"method": Literal["POST", "GET", ...],
"query_string" : str,
"headers" : list[tuple[str, str]]
"state"?: dict[str, Any],
... # other stuff
}
type HttpReceiveEvent = {
"type": Literal["http.request"],
"body": bytes, # the body of the request
"more_body": bool, # whether there is more body to come
} | {
"type": Literal["http.disconnect"],
}
type HttpSendEvent = {
"type": Literal["http.response.start"],
"status": int, # http status code
"headers": list[tuple[str, str]], # http headers
} | {
"type": Literal["http.response.body"],
"body": bytes, # the body of the response
"more_body": bool, # whether there is more body to come
}
There is a similar protocol for websockets.
Lifespans
There is a special protocol for managing startup and shutdown of the server, called the lifespan protocol.
Lifespan ASGI protocol dictionary types.
type LifespanScope = {
"type": Literal["lifespan"],
"state" ?: dict[str, Any],
}
type LifespanReceiveEvent = {
"type": Literal["lifespan.startup"],
} | {
"type": Literal["lifespan.shutdown"],
}
type LifespanSendEvent = {
"type": Literal["lifespan.startup.complete"],
} | {
"type": Literal["lifespan.startup.failed"],
"message"?: str,
} | {
"type": Literal["lifespan.shutdown.complete"],
} | {
"type": Literal["lifespan.shutdown.failed"],
"message"?: str,
}
How it works:
- The server will call
asgi_appwith ascopedictionary set to{"type": "lifespan", "state" : {}}, it keeps a reference to the"state"dictionary. - The app function
asgi_appwillawait recieve()a"lifetime.startup"event. - The app function will then do whatever it needs to do to start up, includings setting values on the
scope['state']dictionary. - The app function will
await send({"type": "lifespan.startup.complete"}), The server will then start processing requests. Each request will callasgi_appwithscopedictionaries that have a shallow copy of the state dictionary. - The app function will
await receive()a"lifetime.shutdown"event, which will resolve when the server is shutting down. - The app function will then do whatever it needs to do to shut down.
- The app function will
await send({"type": "lifespan.shutdown.complete"}), and the server will exit, (or a"lifespan.shutdown.failed"message if it failed)
The "state" dictionary is a good place to put things like database connections, so that they can be shared between requests.
There is a caveat to using the lifespan protocol though, which is that it is only called once for the lifetime of the server, not for each worker thread/process in the server. This can cause nasty bugs with DB connections. For example, SQLAlchemy connections are not multiprocess-safe, so you can't keep a connection object on the state dictionary, because it will be copied between uvicorn worker processes.
Starlette
Now we are ready to talk about Starlette. Starlette is a Python library for creating ASGI applications to map HTTP requests to handler functions.
Routers and Routes
Starlette does this using the Router and BaseRoute classes.
Here is a simplified version of the code for these classes.
Simplified excerpt for Starlette routing. source
class Match(Enum):
NONE = 0
"""The route does not match the scope"""
PARTIAL = 1
"""The route matches the scope, but it should be given
lower priority if any other routes are a full match."""
FULL = 2
"""The route matches the scope"""
class BaseRoute:
def matches(self, scope : Scope) -> tuple[Match, Scope]:
"""A predicate function to determine whether the
request scope will match with this route.
Returns:
Match: whether the route matches the scope.
Scope: a new scope that will be merged with the
original scope and passed to the handler function.
"""
raise NotImplementedError()
def url_path_for(self, name : str, **path_params) -> URLPath:
""" Generate a URL from a route name and path parameters.
The route name is some string that internally identifies the route.
"""
raise NotImplementedError()
async def handle(self, scope : Scope, receive : ReceieveFn, send : SendFn):
"""Handle the request using ASGI protocol."""
raise NotImplementedError()
async def __call__(self, scope, receive, send):
match, child_scope = self.matches(scope)
if match == Match.None:
return await not_found(scope, receive, send)
scope.update(child_scope)
return await self.handle(scope, receive, send)
type Lifespan= Callable[[ASGIApp], AsyncContextManager[dict[str, Any]]
# I've changed the signature of middleware slightly to
# make it clear it's just a function on ASGI apps
type Middleware = Callable[[ASGIApp], ASGIApp]
@dataclass
class Router:
routes : list[BaseRoute]
lifespan : Lifespan
middleware : list[Middleware]
def app(self, scope, receive, send):
""" ASGI app function _before_ middleware is applied."""
if scope["type"] == "lifespan":
# lifespan() is as described in the lifespan protocol,
# using Lifespan type as you would expect
return self.lifespan(scope)
partial = None
for route in self.routes:
match, child_scope = route.matches(scope)
if match == Match.FULL:
scope.update(child_scope)
return await route.handle(scope, receive, send)
elif match == Match.PARTIAL and partial is None:
partial = route
partial_scope = child_scope
if partial is not None:
scope.update(partial_scope)
await partial.handle(scope, receive, send)
return
# ... some extra logic here to that if the route path ends
# with a slash we redirect to the same path without the slash
# not_found will pump out a 404 error
return await self.not_found(scope, receive, send)
def __call__(self, scope, receive, send):
""" ASGI app function _after_ middleware is applied."""
app = self.app
for middleware in reversed(self.middleware):
app = middleware(app)
return app(scope, receive, send)
That's all Starlette is doing at its core.
There is a class Starlette that afaict is just a wrapper around a Router instance, with some convenience methods for adding routes and default middleware for exception handling.
You could just use the bare Router class as the ASGI app and everything would still work.
One important note about lifespans is the lifespan events are not forwarded to the routes. Only the root Router will recieve and process the lifespan events.
I think this makes sense because routes can be added dynamically, and its not clear whether late-joining routes should have a lifespan event or not.
Route implementations
Starlette comes with some implementations of BaseRoute that you can use to create routes.
Routetakes a path string like"/foo/{bar}", an http method"GET"and a callback function, and will invoke the callback if the path matches the request path (parsing out parameters like{bar}). You can also pass regex patterns to match the path. See thecompile_pathfunction inrouting.pyfor more details.Route's callback function will not use ASGI and instead use convenience classesRequestandResponsefor working with HTTP requests and responses. These manage details of HTTP such as streaming responses and requests, forms, and making sure headers are set properly.WebSocketRouteis the same asRoutebut for websockets instead of HTTP requests.Mounttakes a path prefix, an ASGI app or a list of routes and makes a 'sub-app' that will match requests with the path prefix.Hostwill look at theHost:http header and match the request to a sub-app based on the host.StaticFileswill serve static files from a directory.
Extra things Starlette does
- Middleware library for things like auth, cors, gzip, error management etc.
- A simple OpenAPI schema generator
Responseclasses for forms, streaming, jinja templates, etc.- A test client for unit tests.
FastAPI
FastAPI is another library that builds on top of Starlette to provide some convenience features:
- It will use type annotations on router handlers to automatically perform validation and OpenAPI documentation generation.
- It has a type-annotation-based dependency injection system that can automatically inject things like database connections into handler functions.
- That is it.
Appendix: URLs
An example URL.
https://hello.example.com:443/cheese/cheddar?strength=10&sort=asc#nutritional-info
And here are the bits you have to know
- scheme = the bit before
://, iehttps - authority or host =
hello.example.com:443. The//indicates the next part is the authority.- domain =
hello.example.com. Alternatively to a domain we can have a straight-up IP address.- subdomain =
hello - domain name =
example.com - top-level domain =
com
- subdomain =
- port =
443
- domain =
- path =
/cheese/cheddar - query =
?strength=10&sort=asc, a set of key-value pairs separated by&and starting with? - anchor =
#nutritional-info, an identifier that points to a specific part of the page (eg in html, it's theidattribute of an element)
So when we write an http router for the server, we can condition on:
- the url subdomain
- the url path
- the url query
- the http headers
- the http method
- the http request body
So there are 6 ways of passing information to the router, and they are all used in different ways on different servers.