PICO-8 Wiki
Advertisement

PICO-8 keeps track of CPU usage using two values: Lua cycles and system cycles. Most operations affect Lua cycles, but some functions have an additional system cycle cost.

There are 4194304 cycles in a second (2^22), so about 69905 cycles per frame at 60 FPS, or 139810 cycles at 30 FPS. The function call stat(1) returns the total (Lua + system) cycle ratio for the current frame, and stat(2) returns the system ratio.

Example: since cls() uses 2 Lua cycles and 1024 system cycles, PICO-8 running at 60 FPS can call cls() about 2^22/60/1026 = 68 times.

Optimization Tips:

Some tips for when your code isn't running fast enough: (these will increase your code's size and reduce its clarity, however - it's a trade-off)

  • First, make sure you know why your code is running slow - which part's costing the most time? Use time() or stat(1) calls to measure this, or just delete blocks of code to see where the problem lies.
  • Focus on just the code causing the most slowdown (usually a while/for loop), and only until the desired speed is achieved, as optimizing your whole code will quickly run you out of tokens for no actual gain. (Often, 99% of the time is spent in 1% of the code. Optimizing the rest of the code is pointless).
  • Having a stat(1) printh before the end of _update & _draw (or before the flip) that will show you how your game's actual performance is improving (or not) as you're making optimizations is invaluable here.
  • If doing an optimization doesn't seem to help actual performance (as measured by the above point's stat(1)), you've probably failed to find the actual problem point, try spending more time on that.

Now that you found the code causing the slowdown:

  • You can always remove it if it's not essential. That's one of the only optimizations that will improve your code size and clarity, too!
  • Forget about the code for a moment and consider what it's supposed to be doing - what's the fastest way that can be implemented? Can a clever algorithm or data structure be used to avoid pointless calculation?
  • For example, pico has a fair(ish) amount of lua memory - 2 MB - a function that has a small (or sometimes not-so-small) set of possible inputs and does slow computations on them can often be replaced with a lookup table (which could be computed at startup time, if too large to fit in the code).
  • Now onto the micro-optimizations:
  • Function calls cost, so inlining short calls (replacing the calls with the code inside the function) can help performance (in exchange for severely harming code size and clarity - use with care).
  • Access to global or non-local variables (locals from other functions) is slower than access to local variables - use local variables instead, if possible. If a global or non-local variable is read multiple times, it'd save cycles to cache it in a local variable first (this helps a bit even if the variable's read twice).

Lua cycles

Some standard Lua operation costs: (tested on 0.1.12c)

  • Access to global variables or to local variables from another function: 1 cycle per access. (Only local variables within same function avoid this cost)
  • Assignment statement: minimum 1 cycle per assignment, does not combine with other costs. (E.g. 'a=b' and 'a=b+c' both cost only 1 cycle, 'a,b=b,a' costs 2)
  • Regular arithmetic operators: 1 cycle (note that x^y is excluded - has an additional varying system cost)
  • Logical operators: for a chain of N and/or operators, of which K are evaluated (not short-circuited), the cost is K+1 cycles. ('not' is 1 cycle). (E.g. 'a and b and c' costs 1, 2, or 3, depending on how many are evaluated)
  • Relational operators: if directly inside an if/while condition: 0 cycle. If directly inside a logical operator: 1 cycle total for all relational operators within that chain of logical operators. If directly inside something else: 2 cycles. (E.g. 'a==b==c' and, 'a==a and b==b and c==c' both cost 4 cycles)
  • String concatenation: 3 cycles
  • Table element access: 1 cycle
  • Table construction: 1+n+L cycles, where n is the number of elements, and L is 1 if some of the elements are list-style (without an explicit key) or otherwise 0. (E.g. {a,b} is 4 cycles, but {[1]=a,[2]=b} is 3 cycles. Funny)
  • Function call: 3+n cycles, where n is the number of arguments
  • Function return: n cycles, where n is the number of return values. (returning no values costs 0 cycles)
  • If statement: roughly 1 cycle per evaluated if/elseif.
  • While loop: roughly 1 cycle per iteration.
  • Numeric for loop: 5+n, where n is the number of iterations.
  • do … end: 0 cycles

Lua CPU stats are only updated every 1024 cycles.

Functions that add negative Lua cycles

Some functions have negative Lua cycles associated with them that get subtracted from the Lua cycle count by the PICO-8 runtime. This mechanism allows PICO-8 to make these functions artificially cheaper.

For instance, shl(x,y) should cost 4 cycles because it is a function call with two arguments, but each call subtracts 3 cycles from the Lua cycle counter, for a total of 1 cycle.

The table below lists functions that have their total cost tweaked in this way.

Function Adjusted cycles Notes
peek(x), peek2(x), peek4(x) 1 only when called with 1 argument
poke(x,y) 1 only when called with 2 arguments
poke2(x,y), poke4(x,y) 2 only when called with 2 arguments
band(x,y), bor(x,y), bxor(x,y) 1 only when called with 2 arguments
bnot(x) 1 only when called with 1 argument
shl(x,y), shr(x,y), lshr(x,y) 1 only when called with 2 arguments
rotl(x,y), rotr(x,y) 1 only when called with 2 arguments
flr(x), ceil(x) 1 only when called with 1 argument

Notable math functions that do not have negative cycles are: sgn(x), abs(x), the comparison functions, and the trigonometric functions.

System cycles

Only a few functions consume system cycles. Note that they will add to their standard Lua cycle cost.

System CPU stats are updated after each call.

Here is the list, measured on PICO-8 1.1.11g:

Function Cycles Notes
cls() 1024 same cost as rectfill of same size
print() 2+n*8 n is the number of characters in the string, even those not rendered

spaces, newlines, and double-width glyphs each count as one character

spr() n n is the number of pixels drawn, including transparent pixels (width × height of the sprite rectangle)

cost is 0 if first argument is outside the [0, 255] range

sspr() n n is the number of pixels drawn, including transparent pixels (width × height of the destination rectangle)
rect() max(1,2*ceil(a/4)) + max(0,2*ceil(b/2-1))

Where:

  • w,h = abs(x2-x1),abs(y2-y1)
  • a,b = max(w,h),min(w,h)
rectfill() max(1,flr(n/16)) n is the number of pixels drawn (width × height)
circ() 2+n*4 warning: that formula is incomplete for clipped circles
circfill() n*flr((n+9)/4) warning: that formula is incomplete for clipped circles
line() ceil(n/2) n is the number of pixels drawn; there is an additional cost of 1 if at least one pixel had to be clipped
map() / mapdraw() max(1,n*64) n is the number of sprites rendered; only cells that are not zero in the map are considered
music() 16 no cost if no argument
sfx() 16 no cost if no argument
memcpy() n+1 n is the number of bytes copied
memset() max(1,ceil(n/2)) n is the number of bytes set
reload() max(1,n*8) n is the number of bytes reloaded
btn() 4 no cost if no argument
btnp() 4 no cost if no argument
rnd() 4
sqrt() 24 only 16 if argument is zero
x^y 8*(n+1) n is the position of the last fractional bit in y; for instance, cost is 8 for any integer such as y == 13, and is 8*3 for y == 1.25
stat() 16
Advertisement