Assumptions

  • You have a CF deployed
  • You have one proxy app pushed and named appA (the fewer apps you have deployed the better)
  • You have the public_networks security group bound to all running and staging containers

What

In the first story you learned what ASGs are, but how are they implemented under-the-hood?

The Non-Dynamic Way

By default Dynamic ASGs are enabled, but when they are not enabled this is how the ASGs flow through the system. This is provided for historical context since Dynamic ASGs and their codepath are relatively new.

image

Iptables rules for non-Dynamic ASGs are only written when a new app container is created. This means the app needs to be restarted in order for the new ASGs to take affect. That sucks!

The Dynamic Way

image

You’ll notice that the Dymanic ASG codepath is very similar to the c2c network policy codepath. Both c2c network policies and ASGs result in iptables rules on Diego Cells. So when the team implemented Dynamic ASGs they decided to reuse the c2c architecture that already existed.

How

We are going to create our own ASG and follow it through the system.

📝 Make your own ASG with an easy to find IP

  1. Make a file asg.json
    [
      {
        "protocol": "all",
        "destination": "1.2.3.4"
      }
    ]
    
  2. Create your very own ASG using this file: cf create-security-group --help
  3. Bind it to all running apps: cf bind-running-security-group --help

🤔 See how it is stored in the Cloud Controller

  1. Use cf curl to call a v3 CAPI endpoint to list all of the security groups. Find your brand new security group.

📝 Look at the syncer logs

Let’s turn on debug logging and look at what the syncer is doing.

  1. Figure out which VM the Policy Server ASG Syncer is running on
    bosh is --ps | grep policy-server-asg-syncer
    
  2. Bosh ssh onto that VM.
  3. Become root: sudo su -
  4. Look at the config for the syncer.
    less /var/vcap/jobs/policy-server-asg-syncer/config/policy-server-asg-syncer.json
    
  5. Change the log level in the config from “info” to “debug”.
  6. Restart the syncer process.
    monit restart policy-server-asg-syncer
    
  7. Watch the logs.
    tail -f * /var/vcap/sys/log/policy-server-asg-syncer/policy-server-asg-syncer.stdout.log
    
  8. 😱 You might’ve seen something like the following in the logs
    {
      "timestamp": "2024-09-06T19:15:54.962147693Z",
      "level": "error",
      "source": "cfnetworking.policy-server-asg-syncer",
      "message": "cfnetworking.policy-server-asg-syncer.locket-lock.failed-to-acquire-lock",
      "data": {
        "error": "rpc error: code = AlreadyExists desc = lock-collision",
        "lock": {
          "key": "policy-server-asg-syncer",
          "owner": "9bbc1c96-f033-486b-a8ca-b240bb86a188",
          "type": "lock",
          "type_code": 1
        },
      }
    }
    

    This means that you have multiple VMs running with the syncer, but there can only be one syncer actually syncing at a time.

  9. If you saw the above failed-to-acquire-lock log message, ssh into the other VM running a syncer, become root, run monit stop policy-server-asg-syncer, then come back to this VM where the syncer has debug log level set.

❓ What do you see in the logs? Can you find your ASG referenced?

❓ What do you think the get-security-groups-last-update-response log line is all about? How might it relate to the skipping-update log lines?

Expected Result

Everytime the syncer is restarted – if it has the lock – the syncer will grab all ASGs from Cloud Controller and put them in the Policy Server database. You should’ve seen this happen in the logs with the message cfnetworking.policy-server-asg-syncer.get-security-groups-retry-loop-succeeded and then a list of ASGs, including the one you just created.

After this initial sync, the syncer only updates the ASGs when things change or new ASGs are added.

Resources