PICO8 keeps track of CPU usage using two values: Lua cycles and system cycles. Most operations affect Lua cycles, but some functions have an additional system cycle cost.
There are 4194304 cycles in a second (2^22), so about 69905 cycles per frame at 60 FPS, or 139810 cycles at 30 FPS. The function call stat(1)
returns the total (Lua + system) cycle ratio for the current frame, and stat(2)
returns the system ratio.
Example: sincecls()
uses 2 Lua cycles and 1024 system cycles, PICO8 running at 60 FPS can callcls()
about2^22/60/1026 = 68
times.
Optimization Tips: Edit
Some tips for when your code isn't running fast enough: (these will increase your code's size and reduce its clarity, however  it's a tradeoff)
 First, make sure you know why your code is running slow  which part's costing the most time? Use time() or stat(1) calls to measure this, or just delete blocks of code to see where the problem lies.
 Focus on just the code causing the most slowdown (usually a while/for loop), and only until the desired speed is achieved, as optimizing your whole code will quickly run you out of tokens for no actual gain. (Often, 99% of the time is spent in 1% of the code. Optimizing the rest of the code is pointless).
 Having a stat(1) printh before the end of _update & _draw (or before the flip) that will show you how your game's actual performance is improving (or not) as you're making optimizations is invaluable here.
 If doing an optimization doesn't seem to help actual performance (as measured by the above point's stat(1)), you've probably failed to find the actual problem point, try spending more time on that.
Now that you found the code causing the slowdown:
 You can always remove it if it's not essential. That's one of the only optimizations that will improve your code size and clarity, too!
 Forget about the code for a moment and consider what it's supposed to be doing  what's the fastest way that can be implemented? Can a clever algorithm or data structure be used to avoid pointless calculation?
 For example, pico has a fair(ish) amount of lua memory  2 MB  a function that has a small (or sometimes notsosmall) set of possible inputs and does slow computations on them can often be replaced with a lookup table (which could be computed at startup time, if too large to fit in the code).
 Now onto the microoptimizations:
 Function calls cost, so inlining short calls (replacing the calls with the code inside the function) can help performance (in exchange for severely harming code size and clarity  use with care).
 Access to global or nonlocal variables (locals from other functions) is slower than access to local variables  use local variables instead, if possible. If a global or nonlocal variable is read multiple times, it'd save cycles to cache it in a local variable first (this helps a bit even if the variable's read twice).
Lua cycles Edit
Some standard Lua operation costs: (tested on 0.1.12c)
 Access to global variables or to local variables from another function: 1 cycle per access. (Only local variables within same function avoid this cost)
 Assignment statement: minimum 1 cycle per assignment, does not combine with other costs. (E.g. 'a=b' and 'a=b+c' both cost only 1 cycle, 'a,b=b,a' costs 2)
 Regular arithmetic operators: 1 cycle (note that x
^y
is excluded  has an additional varying system cost)  Logical operators: for a chain of N and/or operators, of which K are evaluated (not shortcircuited), the cost is K+1 cycles. ('not' is 1 cycle). (E.g. 'a and b and c' costs 1, 2, or 3, depending on how many are evaluated)
 Relational operators: if directly inside an if/while condition: 0 cycle. If directly inside a logical operator: 1 cycle total for all relational operators within that chain of logical operators. If directly inside something else: 2 cycles. (E.g. 'a==b==c' and, 'a==a and b==b and c==c' both cost 4 cycles)
 String concatenation: 3 cycles
 Table element access: 1 cycle
 Table construction: 1+n+L cycles, where n is the number of elements, and L is 1 if some of the elements are liststyle (without an explicit key) or otherwise 0. (E.g. {a,b} is 4 cycles, but {[1]=a,[2]=b} is 3 cycles. Funny)
 Function call: 3+n cycles, where n is the number of arguments
 Function return: n cycles, where n is the number of return values. (returning no values costs 0 cycles)
 If statement: roughly 1 cycle per evaluated if/elseif.
 While loop: roughly 1 cycle per iteration.
 Numeric for loop: 5+n, where n is the number of iterations.
 do … end: 0 cycles
Lua CPU stats are only updated every 1024 cycles.
Functions that add negative Lua cycles Edit
Some functions have negative Lua cycles associated with them that get subtracted from the Lua cycle count by the PICO8 runtime. This mechanism allows PICO8 to make these functions artificially cheaper.
For instance, shl(x,y)
should cost 4 cycles because it is a function call with two arguments, but each call subtracts 3 cycles from the Lua cycle counter, for a total of 1 cycle.
The table below lists functions that have their total cost tweaked in this way.
Function  Adjusted cycles  Notes 
peek(x) , peek2(x) , peek4(x)  1  only when called with 1 argument 
poke(x,y)  1  only when called with 2 arguments 
poke2(x,y) , poke4(x,y)  2  only when called with 2 arguments 
band(x,y) , bor(x,y) , bxor(x,y)  1  only when called with 2 arguments 
bnot(x)  1  only when called with 1 argument 
shl(x,y) , shr(x,y) , lshr(x,y)
 1  only when called with 2 arguments 
rotl(x,y) , rotr(x,y)  1  only when called with 2 arguments 
flr(x) , ceil(x)  1  only when called with 1 argument 
Notable math functions that do not have negative cycles are: sgn(x)
, abs(x)
, the comparison functions, and the trigonometric functions.
System cycles Edit
Only a few functions consume system cycles. Note that they will add to their standard Lua cycle cost.
System CPU stats are updated after each call.
Here is the list, measured on PICO8 1.1.11g:
Function  Cycles  Notes 
cls()  1024  same cost as rectfill of same size

print()  2+n*8
 n is the number of characters in the string, even those not rendered
spaces, newlines, and doublewidth glyphs each count as one character 
spr()  n  n is the number of pixels drawn, including transparent pixels (width × height of the sprite rectangle)
cost is 0 if first argument is outside the [0, 255] range 
sspr()  n
 n is the number of pixels drawn, including transparent pixels (width × height of the destination rectangle)

rect()  max(1,2*ceil(a/4)) + max(0,2*ceil(b/21))

Where:

rectfill()  max(1,flr(n/16))  n is the number of pixels drawn (width × height)

circ()  2+n*4
 warning: that formula is incomplete for clipped circles 
circfill()  n*flr((n+9)/4)
 warning: that formula is incomplete for clipped circles 
line()  ceil(n/2)  n is the number of pixels drawn; there is an additional cost of 1 if at least one pixel had to be clipped

map() / mapdraw()  max(1,n*64)
 n is the number of sprites rendered; only cells that are not zero in the map are considered

music()  16  no cost if no argument 
sfx()  16  no cost if no argument 
memcpy()  n+1
 n is the number of bytes copied

memset()  max(1,ceil(n/2))
 n is the number of bytes set

reload()  max(1,n*8)
 n is the number of bytes reloaded

btn()  4  no cost if no argument 
btnp()  4  no cost if no argument 
rnd()  4  
sqrt()  24  only 16 if argument is zero

x^y  8*(n+1)  n is the position of the last fractional bit in y ; for instance, cost is 8 for any integer such as y == 13 , and is 8*3 for y == 1.25

stat()  16 