The Gnutella protocol

Last update: 15 April 2000

Updated PUSH request routing instructions. Please comment. gene@wego.com


Notes
Everything is in network byte order unless otherwise noted. Byte order of the GUID is not important.

Apparently, there is some confusion as to what "\r" and "\n" are. Well, \r is carriage return, or 0x0d, and \n is newline, or 0x0a. This is standard ASCII, but there it is, from "man ascii".

Keep in mind that every message you send can be replied by multiple hosts. Hence, Ping is used to discover hosts, as the Pong (Ping reply) contains host information.

Throughout this document, the term server and client is interchangeable. Gnutella clients are Gnutella servers.

Thanks to capnbry for his efforts in decoding the protocol and posting it.

How GnutellaNet works

General description

GnutellaNet works by "viral propagation". I send a message to you, and you send it to all clients connected to you. That way, I only need to know about you to know about the entire rest of the network.

A simple glance at this message delivery mechanism will tell you that it generates inordinate amounts of traffic. Take for example the defaults for Gnutella 0.54. It defaults to maintaining 25 active connections with a TTL (TTL means Time To Live, or the number of times a message can be passed on before it "dies"). In the worst of worlds, this means 25^7, or 6103515625 (6 billion) messages resulting from just one message!

Well, okay. In truth it isn't that bad. In reality, there are less than two thousand Gnutella clients on the GnutellaNet at any one time. That means that long before the TTL expires on our hypothetical message, every client on the GnutellaNet will have seen our message.

Obviously, once a client sees a message, it's unnecessary for it to process the message again. The original Gnutella designers, in recognition of this, engineered each message to contain a GUID (Globally Unique Identifier) which allows Gnutella clients to uniquely identify each message on the network.

So how do Gnutella clients take advantage of the GUID? Each Gnutella client maintains a short memory of the GUIDs it has seen. For example, I will remember each message I have received. I forward each message I receive as appropriate, unless I have already seen the message. If I have seen the message, that means I have already forwarded it, so everyone I forwarded it to has already seen it, and so on. So I just forget about the duplicate and save everyone the trouble.

Topology

The GnutellaNet has no hierarchy. Every server is equal. Every server is also a client. So everyone contributes. Well, as in all egalitarian systems, some servers are more equal than others. Servers running on fast connections can support more traffic. They become a hub for others, and therefore get their requests answered much more quickly. Servers on slow connections are relegated to the backwaters of the GnutellaNet, and get search results much more slowly. And if they pretend to be fast, they get flooded to death.

But there's more to it than that.

Each Gnutella server only knows about the servers that it is directly connected to. All other servers are invisible, unless they announce themselves by answering to a PING or by replying to a QUERY. This provides amazing anonymity.

Unfortunately, the combination of having no hierarchy and the lack of a definitive source for a server list means that the network is not easily described. It is not a tree (since there is no hierarchy) and it is cyclic. Being cyclic means there is a lot of needless network traffic. Clients today do not do much to reduce the traffic, but for the GnutellaNet to scale, developers will need to start thinking about that.

Connecting to a server
After making the initial connection to the server, you must handshake. Currently, the handshake is very simple. The connecting client says:

GNUTELLA CONNECT/0.4\n\n

The accepting server responds:

GNUTELLA OK\n\n

After that, it's all data.

Downloading from a server
Downloading files from a server is extremely easy. It's HTTP. The downloading client requests the file in the normal way:

GET /get/1234/strawberry-rhubarb-pies.rcp HTTP/1.0\r\n
Connection: Keep-Alive\r\n
Range: bytes=0-\r\n
\r\n

As you can see, Gnutella supports the range parameter for resuming partial downloads. The 1234 is the file index (see HITS section, below), and "strawberry-rhubarb-pies.rcp" is the filename.

The server will respond with normal HTTP headers. For example:

HTTP 200 OK\r\n
Server: Gnutella\r\n
Content-type:application/binary\r\n
Content-length: 948\r\n
\r\n

The important bit is the "Content-Length" header. That tells you how much data to expect. After you get your fill, close the socket.

Header
bytes summary description
0-15 Message identifier This is a Windows GUID. I'm not really sure how globally-unique this has to be. It is used to determine if a particular message has already been seen.
16 Payload descriptor (function identifier)
Value Function
0x00 Ping
0x01 Pong (Ping reply)
0x40 Push request
0x80 Query
0x81 Query hits (Query reply)
17 TTL Time to live. Each time a message is forwarded its TTL is decremented by one. If a message is received with TTL less than one (1), it should not be forwarded.
18 Hops Number of times this message has been forwarded.
19-22 Payload length The length of the ensuing payload.

Payload: ping (function 0x00)
No payload
Routing instructions
Forward PING packets to all connected clients. Most other documents state that you should not forward packets to their originators. I think that's a good optimization, but not a real requirement. A server should be smart enough to know not to forward a packet that it originated.

A cursory analysis of GnutellaNet traffic shows that PING comprises roughly 50% of the network traffic. Clearly, this needs to be optimized. One of the problems with clients today is that they seem to PING the network periodically. That is indeed necessary, but the frequency of these "update" PINGs can be drastically reduced. Simply watching the PONG messages that your client routes is enough to capture lots of hosts.

One possible way to really reduce the number of PINGs is to alter the protocol to support PING messages which includes PONG data. That way you need only wait for hosts to announce themselves, rather than discovering them yourself.

Payload: pong (query reply) (function 0x01)
bytes summary description
0-1 Port IPv4 port number.
2-5 IP address IPv4 address. x86 byte order! Little endian!
6-9 Number of files Number of files the host is sharing.
10-13 Number of kilobytes Number of kilobytes the host is sharing.
Routing instructions
Like all replies, PONG packets are "routed". In other words, you need to forward this packet only back down the path its PING came from. If you didn't see its PING, then you have an interesting situation that should never arise. Why? If you didn't see the PING that corresponds with this PONG, then the server sending this PONG routed it incorrectly.

Payload: query (function 0x80)
bytes summary description
0-1 Minimum speed The minimum speed, in kilobytes/sec, of hosts which should reply to this request.
2+ Search criteria Search keywords or other criteria. NULL terminated.
Forward QUERY messages to all connected servers.
Routing instructions

Payload: query hits (query reply) (function 0x81)
bytes summary description
0 Number of hits (N) The number of hits in this set. See "Result set" below.
1-2 Port IPv4 port number.
3-6 IP address IPv4 address. x86 byte order! Little endian!
7-10 Speed Speed, in kilobits/sec, of the responding host.
11+ Result set There are N of these (see "Number of hits" above).

bytes summary description
0-3 Index Index number of file.
4-7 Size Size of file in bytes.
8+ File name Name of file. Terminated by double-NULL.
Last 16 bytes Client identifier GUID of the responding host. Used in PUSH.
Routing instructions
HITS are routed. Send these messages back on their inbound path.

Payload: push request (function 0x40)
bytes summary description
0-15 Client identifier GUID of the host which should push.
16-19 Index Index number of file (given in query hit).
20-23 IP address IPv4 address to push to.
24-25 Port IPv4 port number to push to.
Routing instructions
Forward PUSH messages only along the path on which the query hit was delivered. If you missed the query hit then drop the packet, since you are not instrumental in the delivery of the PUSH request.

Need some feedback on this one.