Evbogue.com


Under the hood with the Sbot/Decent gossip schedule

By Ev Bogue - February 4th 2017

Last week I took a vacation from socializing on the sbot network, because I was getting very grrr/agro about a code decision involving difficult-to-override inline styles in JavaScript pseudocode.

Within an hour of vacation I knew what I wanted to do: I decided to pull the Scuttlebot gossip schedule into the garage in order to learn more about what is going on under the hood. If this was an actual car, I'd have no idea what was going on under the hood. I'd still be trying to figure out what was wrong. To quote Box, when I was in the Alehouse parking lot with car wouldn't turn off (even with the keys out of the ignition) early last year "if this was a computer, you'd probably have this fixed by now wouldn't you?"

What is the gossip schedule and why do you care? Scuttlebot is a daemon for distributed social networking that replicates cryptographically secure feeds created by public/private keypairs. The gossip schedule tells Scuttlebot when to reach out to your peers to check for new updates to their append-only feeds.

I have two reasons for pulling the code into the garage:

  1. Sometimes the gossip schedule doesn't work, I'd love to isolate the bug and fix it
  2. I want to learn how the gossip schedule works
  3. I want to learn everything Dominic Tarr knows about programming

Initially the Scuttlebot gossip schedule seemed to be total 'mad science' to me, but over time I realized there is a strategy behind the magic.

I ended up creating a slightly different gossip schedule based on an older version of Scuttlebot that seems pretty decent to me.

For the purposes of this article, it might be interesting to open these two different gossip schedules in browser tabs. They're both hosted at gitmx.com using git-ssb

Scuttlebot/plugins/gossip/schedule.js

Decent/plugins/gossip/schedule.js

I spent a lot of time opening these two schedules up using two windows in i3. Which looks this way:

The first section I decided to take a look at was the gossip schedule itself, which starts with function connections() { on line 137 in Scuttlebot, and line 124 in Decent.

In Scuttlebot this section of the code has gotten a little bit complex ever since 'Friend Prioritiziation' and 'Persistent Gossip' were introduced. This is how it looks now:

  var connecting = false
  function connections () {
    if(connecting) return
    connecting = true
    setTimeout(function () {
      connecting = false
      var ts = Date.now()
      var peers = gossip.peers()

      var connected = peers.filter(and(isConnect, not(isLocal), not(isFriend))).length
      var connectedFriends = peers.filter(and(isConnect, isFriend)).length

      connect(peers, ts, 'local', isLocal, {
        quota: 3, factor: 2e3, max: 10*min, groupMin: 1e3,
        disable: !conf('local', true)
      })

      // prioritize friends
      connect(peers, ts, 'friends', and(exports.isFriend, exports.isLongterm), {
        quota: 2, factor: 10e3, max: 10*min, groupMin: 5e3,
        disable: !conf('local', true)
      })

      if (connectedFriends < 2)
        connect(peers, ts, 'attemptFriend', and(exports.isFriend, exports.isUnattempted), {
          min: 0, quota: 1, factor: 0, max: 0, groupMin: 0,
          disable: !conf('global', true)
        })

      connect(peers, ts, 'retryFriends', and(exports.isFriend, exports.isInactive), {
        min: 0,
        quota: 3, factor: 60e3, max: 3*60*60e3, groupMin: 5*60e3
      })

      // standard longterm peers
      connect(peers, ts, 'longterm', and(
        exports.isLongterm,
        not(exports.isFriend),
        not(exports.isLocal)
      ), {
        quota: 2, factor: 10e3, max: 10*min, groupMin: 5e3,
        disable: !conf('global', true)
      })

      if(!connected)
        connect(peers, ts, 'attempt', exports.isUnattempted, {
          min: 0, quota: 1, factor: 0, max: 0, groupMin: 0,
          disable: !conf('global', true)
        })

      //quota, groupMin, min, factor, max
      connect(peers, ts, 'retry', exports.isInactive, {
        min: 0,
        quota: 3, factor: 5*60e3, max: 3*60*60e3, groupMin: 5*50e3
      })

      var longterm = peers.filter(isConnect).filter(isLongterm).length

      connect(peers, ts, 'legacy', exports.isLegacy, {
        quota: 3 - longterm,
        factor: 5*min, max: 3*hour, groupMin: 5*min,
        disable: !conf('global', true)
      })

      peers.filter(isConnect).forEach(function (e) {
        var permanent = exports.isLongterm(e) || exports.isLocal(e)
        if((!permanent || e.state === 'connecting') && e.stateChange + 10e3 < ts) {
          gossip.disconnect(e)
        }
      })

    }, 100*Math.random())

  }

As you can see, there's a lot going on up there.

Because the gossip schedule doesn't always work, I decided it might be a good idea to track back to a simpler age in gossip protocols and checked out a version of Scuttlebot from mid-September 2016. This is the gossip schedule I added to Decent:

  function connections () {
    var ts = Date.now()
    var peers = gossip.peers()

    //quota, groupMin, min, factor, max

    connect(peers, ts, 'attempt', exports.isUnattempted, {
      min: 0, quota: 1, factor: 0, max: 0, groupMin: 0,
      disable: !conf('global', true)
    })

    connect(peers, ts, 'retry', exports.isInactive, {
      min: 0, quota: 3, factor: 5*60e3, max: 3*60*60e3, groupMin: 5*50e3
    })

    connect(peers, ts, 'legacy', exports.isLegacy, {
      quota: 3, factor: 5*min, max: 3*hour, groupMin: 5*min,
      disable: !conf('global', true)
    })

    connect(peers, ts, 'longterm', exports.isLongterm, {
      quota: 3, factor: 10e3, max: 10*min, groupMin: 5e3,
      disable: !conf('global', true)
    })

    connect(peers, ts, 'local', exports.isLocal, {
      quota: 3, factor: 2e3, max: 10*min, groupMin: 1e3,
      disable: !conf('local', true)
    })
  }

The above schedule isn't exactly what was going on in September 2016, but it's similar and way simpler.

What we have in the above code is five different types of gossip. attempt, retry, legacy, longterm, and local.

  • Attempt is trying to connect to peers for the first time
  • Retry attempts to connect to them again, bumping up the time frame exponentially every retry, eventually maxing out at only retrying once per year
  • Legacy connects with legacy scuttlebot peers -- I could probably rip this out of Decent because there are no legacy peers
  • Longterm is for peers which maintain an open connection
  • Local peers are people who are using Decent/Sbot on your local wifi network

ts is the time the connection started peers is the list of pubs that the gossip schedule collects from your secure scuttlebutt log when you first fire up the daemon.

Now what is going on in the Scuttlebot gossip schedule farther up the page? I don't really know. There were two patches introduced in October 2016 that were trying to solve a problem that distributed systems have, which is dead pubs. These are 'persistent gossip' and 'friend prioritization'. Both of these patches got added at the same time, and (it seems to me) both attempt to fix the same problem in different ways.

I'm not convinced that these patches succeed in their mission, because when I started using this current Scuttlebot gossip schedule on the Decent network I discovered that it was connecting to some local and longterm peers over and over again, and never disconnecting until the quota was maxed out at 12 or 16 connections. While Decent doesn't have dead pubs yet, I also wonder why the exponential factor on retry wasn't enough to solve the problem.

As I said on the most recent ssbc call, I don't have any firm conclusions to move upstream yet. However, I do think the simpler gossip schedule is easier to read and understand.

In the meantime though, we still have our bug where sometimes replication just fails to happen. However, it happens in both of these gossip schedules so I'm convinced the problem lies elsewhere. Perhaps longterm connections are closing and not being registered as closed?

I will continue digging around under the hood until I solve the issue.

Why Fayettenam? →

← Do we have to move back to America to make it?


gitmx | about | blog | ev@evbogue.com