FANDOM


PICO-8 keeps track of CPU usage using two values: Lua cycles and system cycles. Most operations affect Lua cycles, but some functions have an additional system cycle cost.

There are 8,388,608 cycles per second (2^23), which is about 139,810 cycles per frame at 60 FPS, or 279,620 cycles per frame at 30 FPS. The function call stat(1) returns the total fraction of the current frame spent on Lua cycles + system cycles, and stat(2) returns the fraction spent on just system cycles.

For example, cls() uses 4 Lua cycles and 2048 system cycles, for a total of 2052, so if we assume PICO-8 is running at 60 FPS, we can calculate how many times per frame we can call it: 8,388,608cyc/s / 60f/s / 2052cyc = 68 times.

Optimization Tips Edit

Some tips for when your code isn't running fast enough: (these will increase your code's size and reduce its clarity, however - it's a trade-off)

  • First, make sure you know why your code is running slow - which part's costing the most time? Use time() or stat(1) calls to measure this, or just delete blocks of code to see where the problem lies.
  • Focus on just the code causing the most slowdown (usually a while/for loop), and only until the desired speed is achieved, as optimizing your whole code will quickly run you out of tokens for no actual gain. (Often, 99% of the time is spent in 1% of the code. Optimizing the rest of the code is pointless).
  • Having a stat(1) printh before the end of _update & _draw (or before the flip) that will show you how your game's actual performance is improving (or not) as you're making optimizations is invaluable here.
  • If doing an optimization doesn't seem to help actual performance (as measured by the above point's stat(1)), you've probably failed to find the actual problem point, try spending more time on that.

Now that you found the code causing the slowdown:

  • You can always remove it if it's not essential. That's one of the only optimizations that will improve your code size and clarity, too!
  • Forget about the code for a moment and consider what it's supposed to be doing - what's the fastest way that can be implemented? Can a clever algorithm or data structure be used to avoid pointless calculation?
  • For example, pico has a fair(ish) amount of lua memory - 2 MB - a function that has a small (or sometimes not-so-small) set of possible inputs and does slow computations on them can often be replaced with a lookup table (which could be computed at startup time, if too large to fit in the code).
  • Now onto the micro-optimizations:
  • Function calls cost, so inlining short calls (replacing the calls with the code inside the function) can help performance (in exchange for severely harming code size and clarity - use with care).
  • Access to global or non-local variables (locals from other functions) is slower than access to local variables - use local variables instead, if possible. If a global or non-local variable is read multiple times, it'd save cycles to cache it in a local variable first (this helps a bit even if the variable's read twice).

Lua cycles Edit

Some standard Lua operation costs: (tested on 0.2.0h)

  • Variable access (read):
    • Local variables in same function: 0 cycles.
    • Global variables: 2 cycles.
    • Upvalues (local variables in another function): 2 cycles.
  • Assignment statement:
    • Simple (x=y): 0 cycles if right side of expression already has a cycle cost. 2 cycles otherwise. (yes, this means x=x+y is cheaper than x=y). [Note; this may be changed/fixed in the future]
    • Multiple (x1,x2,..,xn = y1,y2,..,yk): (max(n,k) - 1) * 2 cycles, plus 2 cycles for each right side expression without a cycle cost. (E.g. x,y=y,x is 6 cycles). [Note: this used to be cheaper. It might get changed/fixed again in the future]
  • Arithmetic operators:
    • additive operators (+, -): 1 cycle
    • multiplicative operators (*, /, %, \): 2 cycles
    • unary minus (-): 2 cycles [Note: this is odd]
    • exponentiation (^): 2 regular cycles plus a considerable system cycles cost, described in the system
  • Local Declaration:
    • Default-initialized (local x,y,z): 2 cycles, regardless of amount of locals.
    • Initialized: 2 cycles per initialized local.
  • Binary operators (&, |, ^^, <<, >>, >>>, <<>, >><, ~): 1 cycle.
  • Logical operators:
    • and/or: 0 cycles if short-circuited, 2 cycles otherwise. +2 extra cycles unless directly inside an if/while/and/or.
    • unary not: 2 cycles.
  • Relational operators (<, >, <=, >=, ==, !=): 2 cycles. +2 extra cycles unless directly inside an if/while/and/or.
  • String concatenation operator (..): 6 cycles
  • Memory peek operators (@, %, $): 1 cycle
  • Table element access: 2 cycle
  • Table construction:
    • With at least one positional (list-style) element: 4 cycles + 2 cycles per (any) element.
    • Otherwise: 2 cycles + 2 cycles per (any) element.
    • The 2 cycles per element cost is max'ed with the cost of the expression that defines that element. (So {1+2} costs 4 cycles, not 5)
    • (Note: This means that {a,b} is 8 cycles, but {[1]=a,[2]=b} is 6 cycles. Funny)
  • Table length (#): 2 cycles.
  • Function construction: 2 cycles. [Todo: even if it captures locals? That definitely wasn't the case before...]
  • Function call: 4 cycles + 2 cycles per argument.
    • The 2 cycles per argument cost is max'ed with the cost of the expression that defines that argument. (So func(1+2) costs 6 cycles, not 7)
    • This cost is the same regardless of whether the function is accessed through a local, a global, or an upvalue.
  • Function return: 2 cycles + 2 cycles per return value.
    • If a function returns without an explicit return statement, that also costs 2 cycles. (You can think of it as an implicit return statement)
  • If statement: 2 cycle per evaluated if/elseif.
    • This cost is max'ed with the cost of the expression in the if/elseif.
  • While loop: 2 + 4n cycles, where n is the number of iterations. (Todo: that much?! Need to double-check)
    • 2 cycles per iteration are max'ed with the cost of the expression in the while.
  • Numeric for loop: 10 + 2n, where n is the number of iterations.
  • do … end: 0 cycles
  • Metamethod access: 0 cycles (doesn't include cost of the metamethod itself)

Lua CPU stats were only updated every 2048 cycles as of 0.1.12c, but in 0.2.0 they started being updated at a precision closer to once per conceptual operation.

Functions that add negative Lua cycles Edit

Some functions have negative Lua cycles associated with them that get subtracted from the Lua cycle count by the PICO-8 runtime. This mechanism allows PICO-8 to make these functions artificially cheaper.

For instance, poke(x,y) should cost 8 cycles because it is a function call with two arguments, but each call subtracts 4 cycles from the Lua cycle counter, for a total of 4 cycle.

The table below lists functions that have their total cost tweaked in this way.

Function Adjusted cycles Notes
peek(x), peek2(x), peek4(x) 4 Only when called with 1 argument.

Operators are faster still.

poke(x,y),poke2(x,y), poke4(x,y) 4 Only when called with 2 arguments
band(x,y), bor(x,y), bxor(x,y) 4 Only when called with 2 arguments.

Operators are faster still.

bnot(x) 4 Only when called with 1 argument.

Operators are faster still.

shl(x,y), shr(x,y), lshr(x,y) 4 Only when called with 2 arguments.

Operators are faster still.

rotl(x,y), rotr(x,y) 4 Only when called with 2 arguments.

Operators are faster still.

flr(x), ceil(x) 4 Only when called with 1 argument.

Functions that add Lua cycles Edit

A few functions consume additional Lua cycles (in addition to the standard cost of 2+(#arguments)):

Out of date - Measured on PICO-8 1.1.12d RC10.

Function Additional cycles Notes
add() 10
all() ??? TODO - Results wildly unclear
del() if n-s > 0 then 8+(2+n-s)*6 else 8 n is the size of the table.

s is 1 if deleted and 0 otherwise.

foreach() if n > 0 then 4+n*(10+c) else 24 n is the size of the table.

c is the cost of the function passed to the foreach.

tostr() if table then 28 else 18 table is true if the argument is a table.
printh() 32
menuitem() 32

The following functions neither add nor subtract cycles, and cost the standard amount:

sgn(), abs(), sin(), cos(), atan2().

min(), max(), mid().

camera(), clip(), cursor(), fillp(), pal(), palt().

fget(), fset(), mget(), mset(), pget(), pset(), sget(), sset().

cocreate(), coresume(), costatus(), dget(), dset(), time(), type().

getmetatable(), setmetatable(), pairs(), next(), rawget(), rawset().

sub(), tonum().

System cycles Edit

A few functions consume system cycles. Note that they will add to their standard Lua cycle cost.

System CPU stats are updated after each call.

Out of date - measured on PICO-8 1.1.11g:

Function Cycles Notes
cls() 2048 same cost as rectfill of same size
print() 4+n*16 n is the number of characters in the string, even those not rendered

spaces, newlines, and double-width glyphs each count as one character

spr() 2*n n is the number of pixels drawn, including transparent pixels (width × height of the sprite rectangle)

cost is 0 if first argument is outside the [0, 255] range

sspr() 2*n n is the number of pixels drawn, including transparent pixels (width × height of the destination rectangle)
rect() 2*max(1,2*ceil(a/4)) + 2*max(0,2*ceil(b/2-1))

Where:

  • w,h = abs(x2-x1),abs(y2-y1)
  • a,b = max(w,h),min(w,h)
rectfill() 2*max(1,flr(n/16)) n is the number of pixels drawn (width × height)
circ() 4+n*8 warning: that formula is incomplete for clipped circles
circfill() 2*n*flr((n+9)/4) warning: that formula is incomplete for clipped circles
line() 2*ceil(n/2) n is the number of pixels drawn; there is an additional cost of 1 if at least one pixel had to be clipped
map() / mapdraw() 2*max(1,n*64) n is the number of sprites rendered; only cells that are not zero in the map are considered
music() 32 no cost if no argument
sfx() 32 no cost if no argument
memcpy() 2*(n+1) n is the number of bytes copied
memset() 2*max(1,ceil(n/2)) n is the number of bytes set
cstore() 2*max(1, n*64) n is the number of bytes stored.
reload() 2*max(1,n*8) n is the number of bytes reloaded
btn() 8 no cost if no argument
btnp() 8 no cost if no argument
rnd() 8
srand() 16
sqrt() 48 only 32 if argument is zero
x^y 16*(n+1) n is the position of the last fractional bit in y; for instance, cost is 8 for any integer such as y == 13, and is 8*3 for y == 1.25
stat() 32
Community content is available under CC-BY-SA unless otherwise noted.