Neural network Q learning in Smarts

Debugging

Example

Here is an example of what might print out on a single time step for a single critter using verbose messages, with verbosity set to 1.

 Brain0 sensed [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 selecting best action for state [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 activating network with input [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 found best action: 0
 Brain0 decided on action 0
 Memory0 learning
 Memory0 selecting highest Q for "next" state [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 activating network with input [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 found highest Q: 0.33656863599341025
 Memory0: target Q for action 6: 0.23559804519538716
 Memory0 activating network with input [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0: activation of output unit 6: 0.33656863599341025
 Memory0: error for output unit 6: -0.10097059079802309
 Memory0 updating weights

Here is an example of the same network with verbose set to 2.

STEPPING THE CRTTER
 Brain0 sensed [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 selecting best action for state [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 activating network with input [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 found best action: 3
 Brain0 decided on action 3
 Memory0 learning
 Memory0 selecting highest Q for "next" state [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0 activating network with input [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
  Q(0): 0.1117156060326332
  Q(1): -0.8906225239255476
  Q(2): -0.08406349392140866
  Q(3): -0.07716272973761817
  Q(4): -0.690736202107713
  Q(5): -0.9835603355115099
  Q(6): 0.3012289292141021
 Memory0 found highest Q: 0.3012289292141021
 Memory0: target Q for action 0: -0.7891397495501286
 Memory0 activating network with input [1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]
 Memory0: activation of output unit 0: 0.1117156060326332
 Memory0: error for output unit 0: -0.9008553555827618
 Memory0 updating weights
  Memory0: weight change for 0->0: -0.04504276777913809
  Memory0: weight change for 1->0: -0.0
  Memory0: weight change for 2->0: -0.0
  Memory0: weight change for 3->0: -0.0
  Memory0: weight change for 4->0: -0.04504276777913809
  Memory0: weight change for 5->0: -0.0
  Memory0: weight change for 6->0: -0.0
  Memory0: weight change for 7->0: -0.0
  Memory0: weight change for 8->0: -0.04504276777913809
  Memory0: weight change for 9->0: -0.0
  Memory0: weight change for 10->0: -0.0
  Memory0: weight change for 11->0: -0.0
  Memory0: weight change for 12->0: -0.04504276777913809
  Memory0: weight change for 13->0: -0.0
  Memory0: weight change for 14->0: -0.0
  Memory0: weight change for 15->0: -0.0
  Memory0: weight change for 16->0: -0.04504276777913809
  Memory0: weight change for 17->0: -0.0
  Memory0: weight change for 18->0: -0.0
  Memory0: weight change for 19->0: -0.0
  Memory0: weight change for 20->0: -0.04504276777913809
  Memory0: weight change for 21->0: -0.0
  Memory0: weight change for 22->0: -0.0
  Memory0: weight change for 23->0: -0.0
  Memory0: weight change for 24->0: -0.04504276777913809
  Memory0: weight change for 0->1: 0.0
  Memory0: weight change for 1->1: 0.0
  Memory0: weight change for 2->1: 0.0
  Memory0: weight change for 3->1: 0.0
  Memory0: weight change for 4->1: 0.0
  Memory0: weight change for 5->1: 0.0
  Memory0: weight change for 6->1: 0.0
  Memory0: weight change for 7->1: 0.0
  Memory0: weight change for 8->1: 0.0
  Memory0: weight change for 9->1: 0.0
  Memory0: weight change for 10->1: 0.0
  Memory0: weight change for 11->1: 0.0
  Memory0: weight change for 12->1: 0.0
  Memory0: weight change for 13->1: 0.0
  Memory0: weight change for 14->1: 0.0
  Memory0: weight change for 15->1: 0.0
  Memory0: weight change for 16->1: 0.0
  Memory0: weight change for 17->1: 0.0
  Memory0: weight change for 18->1: 0.0
  Memory0: weight change for 19->1: 0.0
  Memory0: weight change for 20->1: 0.0
  Memory0: weight change for 21->1: 0.0
  Memory0: weight change for 22->1: 0.0
  Memory0: weight change for 23->1: 0.0
  Memory0: weight change for 24->1: 0.0
  Memory0: weight change for 0->2: 0.0
  Memory0: weight change for 1->2: 0.0
  Memory0: weight change for 2->2: 0.0
  Memory0: weight change for 3->2: 0.0
  Memory0: weight change for 4->2: 0.0
  Memory0: weight change for 5->2: 0.0
  Memory0: weight change for 6->2: 0.0
  Memory0: weight change for 7->2: 0.0
  Memory0: weight change for 8->2: 0.0
  Memory0: weight change for 9->2: 0.0
  Memory0: weight change for 10->2: 0.0
  Memory0: weight change for 11->2: 0.0
  Memory0: weight change for 12->2: 0.0
  Memory0: weight change for 13->2: 0.0
  Memory0: weight change for 14->2: 0.0
  Memory0: weight change for 15->2: 0.0
  Memory0: weight change for 16->2: 0.0
  Memory0: weight change for 17->2: 0.0
  Memory0: weight change for 18->2: 0.0
  Memory0: weight change for 19->2: 0.0
  Memory0: weight change for 20->2: 0.0
  Memory0: weight change for 21->2: 0.0
  Memory0: weight change for 22->2: 0.0
  Memory0: weight change for 23->2: 0.0
  Memory0: weight change for 24->2: 0.0
  Memory0: weight change for 0->3: 0.0
  Memory0: weight change for 1->3: 0.0
  Memory0: weight change for 2->3: 0.0
  Memory0: weight change for 3->3: 0.0
  Memory0: weight change for 4->3: 0.0
  Memory0: weight change for 5->3: 0.0
  Memory0: weight change for 6->3: 0.0
  Memory0: weight change for 7->3: 0.0
  Memory0: weight change for 8->3: 0.0
  Memory0: weight change for 9->3: 0.0
  Memory0: weight change for 10->3: 0.0
  Memory0: weight change for 11->3: 0.0
  Memory0: weight change for 12->3: 0.0
  Memory0: weight change for 13->3: 0.0
  Memory0: weight change for 14->3: 0.0
  Memory0: weight change for 15->3: 0.0
  Memory0: weight change for 16->3: 0.0
  Memory0: weight change for 17->3: 0.0
  Memory0: weight change for 18->3: 0.0
  Memory0: weight change for 19->3: 0.0
  Memory0: weight change for 20->3: 0.0
  Memory0: weight change for 21->3: 0.0
  Memory0: weight change for 22->3: 0.0
  Memory0: weight change for 23->3: 0.0
  Memory0: weight change for 24->3: 0.0
  Memory0: weight change for 0->4: 0.0
  Memory0: weight change for 1->4: 0.0
  Memory0: weight change for 2->4: 0.0
  Memory0: weight change for 3->4: 0.0
  Memory0: weight change for 4->4: 0.0
  Memory0: weight change for 5->4: 0.0
  Memory0: weight change for 6->4: 0.0
  Memory0: weight change for 7->4: 0.0
  Memory0: weight change for 8->4: 0.0
  Memory0: weight change for 9->4: 0.0
  Memory0: weight change for 10->4: 0.0
  Memory0: weight change for 11->4: 0.0
  Memory0: weight change for 12->4: 0.0
  Memory0: weight change for 13->4: 0.0
  Memory0: weight change for 14->4: 0.0
  Memory0: weight change for 15->4: 0.0
  Memory0: weight change for 16->4: 0.0
  Memory0: weight change for 17->4: 0.0
  Memory0: weight change for 18->4: 0.0
  Memory0: weight change for 19->4: 0.0
  Memory0: weight change for 20->4: 0.0
  Memory0: weight change for 21->4: 0.0
  Memory0: weight change for 22->4: 0.0
  Memory0: weight change for 23->4: 0.0
  Memory0: weight change for 24->4: 0.0
  Memory0: weight change for 0->5: 0.0
  Memory0: weight change for 1->5: 0.0
  Memory0: weight change for 2->5: 0.0
  Memory0: weight change for 3->5: 0.0
  Memory0: weight change for 4->5: 0.0
  Memory0: weight change for 5->5: 0.0
  Memory0: weight change for 6->5: 0.0
  Memory0: weight change for 7->5: 0.0
  Memory0: weight change for 8->5: 0.0
  Memory0: weight change for 9->5: 0.0
  Memory0: weight change for 10->5: 0.0
  Memory0: weight change for 11->5: 0.0
  Memory0: weight change for 12->5: 0.0
  Memory0: weight change for 13->5: 0.0
  Memory0: weight change for 14->5: 0.0
  Memory0: weight change for 15->5: 0.0
  Memory0: weight change for 16->5: 0.0
  Memory0: weight change for 17->5: 0.0
  Memory0: weight change for 18->5: 0.0
  Memory0: weight change for 19->5: 0.0
  Memory0: weight change for 20->5: 0.0
  Memory0: weight change for 21->5: 0.0
  Memory0: weight change for 22->5: 0.0
  Memory0: weight change for 23->5: 0.0
  Memory0: weight change for 24->5: 0.0
  Memory0: weight change for 0->6: 0.0
  Memory0: weight change for 1->6: 0.0
  Memory0: weight change for 2->6: 0.0
  Memory0: weight change for 3->6: 0.0
  Memory0: weight change for 4->6: 0.0
  Memory0: weight change for 5->6: 0.0
  Memory0: weight change for 6->6: 0.0
  Memory0: weight change for 7->6: 0.0
  Memory0: weight change for 8->6: 0.0
  Memory0: weight change for 9->6: 0.0
  Memory0: weight change for 10->6: 0.0
  Memory0: weight change for 11->6: 0.0
  Memory0: weight change for 12->6: 0.0
  Memory0: weight change for 13->6: 0.0
  Memory0: weight change for 14->6: 0.0
  Memory0: weight change for 15->6: 0.0
  Memory0: weight change for 16->6: 0.0
  Memory0: weight change for 17->6: 0.0
  Memory0: weight change for 18->6: 0.0
  Memory0: weight change for 19->6: 0.0
  Memory0: weight change for 20->6: 0.0
  Memory0: weight change for 21->6: 0.0
  Memory0: weight change for 22->6: 0.0
  Memory0: weight change for 23->6: 0.0
  Memory0: weight change for 24->6: 0.0
  Received reinforcement: -1
  Received reinforcement: -1
  Received reinforcement: -1

Here is an example of what the weight matrix might look like after several hundred time steps.

                                                                                             State
            -7.1 -11.48  21.56    0.0   0.54   0.92   1.51    0.0  -1.93   2.91    2.0    0.0    3.0   1.53  -1.57    0.0   1.12   0.81   1.04    0.0   1.51   0.26    1.2    0.0   2.97
            -0.1  -0.04  -0.06    0.0  -0.24  -0.02   0.06    0.0  -0.13  -0.04  -0.04    0.0  -0.02  -0.11  -0.07    0.0  -0.09  -0.09  -0.02    0.0   -0.1   -0.1    0.0    0.0   -0.2
            0.95    0.4  -0.05    0.0   0.68  -0.07   0.69    0.0  -4.68  -2.58   8.57    0.0   0.83   0.05   0.42    0.0  -1.04   0.43   1.91    0.0   1.87  -0.26  -0.31    0.0    1.3
Actions     0.92    0.0  -0.13    0.0   0.47    0.2   0.12    0.0    0.9  -0.11    0.0    0.0  -0.93  -0.09   1.81    0.0   -0.6    1.3   0.08    0.0  -1.62  -0.16   2.56    0.0   0.79
            0.06  -0.24    0.0    0.0   -0.1  -0.08    0.0    0.0  -0.01  -0.12  -0.06    0.0  -0.14  -0.04    0.0    0.0  -0.23  -0.24   0.29    0.0  -0.09  -0.09    0.0    0.0  -0.18
           -0.12  -0.03  -0.04    0.0  -0.09  -0.09    0.0    0.0  -0.28   -0.1    0.2    0.0  -0.06  -0.11    0.0    0.0  -0.17   -0.1    0.1    0.0  -0.09  -0.08  -0.01    0.0  -0.18
            0.07  -0.07   0.05    0.0   0.02   0.05  -0.02    0.0   0.09  -0.07   0.03    0.0   0.09  -0.05    0.0    0.0  -0.01  -0.02   0.07    0.0   0.05  -0.07   0.07    0.0   0.04