PICO-8 keeps track of CPU usage using two values: Lua cycles and system cycles. Most operations affect Lua cycles, but some functions have an additional system cycle cost.

There are 8,388,608 cycles per second (2^23), which is about 139,810 cycles per frame at 60 FPS, or 279,620 cycles per frame at 30 FPS. The function call `stat(1)` returns the total fraction of the current frame spent on Lua cycles + system cycles, and `stat(2)` returns the fraction spent on just system cycles.

For example, `cls()` uses 4 Lua cycles and 2048 system cycles, for a total of 2052, so if we assume PICO-8 is running at 60 FPS, we can calculate how many times per frame we can call it: `8,388,608cyc/s / 60f/s / 2052cyc = 68` times.

## Optimization Tips

Some tips for when your code isn't running fast enough: (these will increase your code's size and reduce its clarity, however - it's a trade-off)

• First, make sure you know why your code is running slow - which part's costing the most time? Use time() or stat(1) calls to measure this, or just delete blocks of code to see where the problem lies.
• Focus on just the code causing the most slowdown (usually a while/for loop), and only until the desired speed is achieved, as optimizing your whole code will quickly run you out of tokens for no actual gain. (Often, 99% of the time is spent in 1% of the code. Optimizing the rest of the code is pointless).
• Having a stat(1) printh before the end of _update & _draw (or before the flip) that will show you how your game's actual performance is improving (or not) as you're making optimizations is invaluable here.
• If doing an optimization doesn't seem to help actual performance (as measured by the above point's stat(1)), you've probably failed to find the actual problem point, try spending more time on that.

Now that you found the code causing the slowdown:

• You can always remove it if it's not essential. That's one of the only optimizations that will improve your code size and clarity, too!
• Forget about the code for a moment and consider what it's supposed to be doing - what's the fastest way that can be implemented? Can a clever algorithm or data structure be used to avoid pointless calculation?
• For example, pico has a fair(ish) amount of lua memory - 2 MB - a function that has a small (or sometimes not-so-small) set of possible inputs and does slow computations on them can often be replaced with a lookup table (which could be computed at startup time, if too large to fit in the code).
• Now onto the micro-optimizations:
• Function calls cost, so inlining short calls (replacing the calls with the code inside the function) can help performance (in exchange for severely harming code size and clarity - use with care).
• Access to global or non-local variables (locals from other functions) is slower than access to local variables - use local variables instead, if possible. If a global or non-local variable is read multiple times, it'd save cycles to cache it in a local variable first (this helps a bit even if the variable's read twice).

## Lua cycles

Some standard Lua operation costs: (tested on 0.2.4)

• Local variables in same function: 0 cycles.
• Global variables: 2 cycles.
• Upvalues (local variables in another function): 2 cycles.
• Assignment statement:
• Simple (x=y): minimum 1 cycles. (0 cycles if right side of expression already has a cycle cost)
• Multiple (x1,x2,..,xn = y1,y2,..,yk): (max(n,k) - 1) cycles, plus 1 cycles for each right side expression without a cycle cost. (E.g. x,y=y,x is 3 cycles).
• Arithmetic operators:
• additive operators (+, -): 1 cycle
• multiplicative operators (*, /, %, \): 2 cycles
• unary minus (-): 1 cycles
• exponentiation (^): 2 regular cycles plus a considerable system cycles cost, described in the system section below
• Local Declaration:
• Default-initialized (local x,y,z): 2 cycles, regardless of amount of locals (even one).
• Initialized (local x,y,z=1,2,3): minimum 1 cycles per initialized local. (0 cycles for locals initialized with an expression that already has a cycle cost)
• Partially-Initialized (local x,y,z=1,2): 2 cycles plus minimum 1 cycles per initialized local.
• Binary operators (&, |, ^^, <<, >>, >>>, <<>, >><, ~): 1 cycle.
• Logical operators:
• and/or: 0 cycles if short-circuited, 1 cycles otherwise. +2 extra cycles unless directly inside an if/while/and/or.
• unary not: 2 cycles.
• Nil/False/True constants: 2 cycles.
• Relational operators (<, >, <=, >=, ==, !=): 2 cycles. +2 extra cycles unless directly inside an if/while/and/or.
• String concatenation operator (..): 4 cycles
• Memory peek operators (@, %, \$): 1 cycle
• Table element access: 2 cycle
• Table construction:
• With at least one positional (list-style) element: 4 cycles + 1 cycles per list-style element + 2 cycles per map-style element.
• Otherwise: 2 cycles + 2 cycles per element.
• The cycles per element cost is max'ed with the cost of the expression that defines that element. (So {1+2} costs 5 cycles, not 6)
• Table length (#): 2 cycles.
• Function construction: 2 cycles. [Todo: even if it captures locals? That definitely wasn't the case before...]
• Function call: 4 cycles + minimum 1 cycles per argument.
• The 1 cycle per argument cost is max'ed with the cost of the expression that defines that argument.
• The base cost is only 3 cycles instead of 4 if the function is accessed through a local variable in the same function.
• Function return: 2 cycles + minimum 1 cycles per return value.
• If a function returns without an explicit return statement, that also costs 2 cycles. (You can think of it as an implicit return statement)
• If statement: 2 cycle per evaluated if/elseif.
• This cost is max'ed with the cost of the expression in the if/elseif.
• While loop: 2 + 4n cycles, where n is the number of iterations. (Todo: that much?! Need to double-check)
• 2 cycles per iteration are max'ed with the cost of the expression in the while.
• Numeric for loop: 7 + 2n, where n is the number of iterations.
• do … end: 0 cycles
• Metamethod access: 0 cycles (doesn't include cost of the metamethod itself)
• Goto statement: 2 cycles.

Lua CPU stats were only updated every 2048 cycles as of 0.1.12c, but in 0.2.0 they started being updated at a precision closer to once per conceptual operation.

## Functions that add negative Lua cycles

Some functions have negative Lua cycles associated with them that get subtracted from the Lua cycle count by the PICO-8 runtime. This mechanism allows PICO-8 to make these functions artificially cheaper.

For instance, poke(x,y) should cost 8 cycles because it is a function call with two arguments, but each call subtracts 4 cycles from the Lua cycle counter, for a total of 4 cycle.

The table below lists functions that have their total cost tweaked in this way.

`peek(x)`, `peek2(x)`, `peek4(x)` `3 + 2 system` Only when called with 1 argument.

Operators are faster still.

`poke(x,y)``,``poke2(x,y)`, `poke4(x,y)` `4 + 2 system` Only when called with 2 arguments
`band(x,y)`, `bor(x,y)`, `bxor(x,y)` 4 Only when called with 2 arguments.

Operators are faster still.

`bnot(x)` `4` Only when called with 1 argument.

Operators are faster still.

`shl(x,y)`, `shr(x,y)`, `lshr(x,y)` `4` Only when called with 2 arguments.

Operators are faster still.

`rotl(x,y)`, `rotr(x,y)` `4` Only when called with 2 arguments.

Operators are faster still.

`flr(x)`, `ceil(x)` `4` Only when called with 1 argument.

## Functions that add Lua cycles

A few functions consume additional Lua cycles (in addition to the standard cost of 2+(#arguments)):

Out of date - Measured on PICO-8 1.1.12d RC10.

`add()` 10
`all()` ??? TODO - Results wildly unclear
del() if n-s > 0 then 8+(2+n-s)*6 else 8 n is the size of the table.

s is 1 if deleted and 0 otherwise.

foreach() if n > 0 then 4+n*(10+c) else 24 n is the size of the table.

c is the cost of the function passed to the foreach.

tostr() if table then 28 else 18 table is true if the argument is a table.
printh() 32

The following functions neither add nor subtract cycles, and cost the standard amount:

`sgn()`, `abs()``, sin(), cos(),` `atan2().`

min(), max(), mid()`.`

`camera(), clip(), cursor(), fillp(), pal(), palt().`

`fget()``, fset(), mget(), mset(), pget(), pset(), sget(), sset().`

## System cycles

A few functions consume system cycles. Note that they will add to their standard Lua cycle cost.

System CPU stats are updated after each call.

Out of date - measured on PICO-8 1.1.11g:

Function Cycles Notes
`cls()` `2048` same cost as `rectfill` of same size
`print()` 4`+n*16` `n` is the number of characters in the string, even those not rendered

spaces, newlines, and double-width glyphs each count as one character

`spr()` 2*n `n` is the number of pixels drawn, including transparent pixels (width × height of the sprite rectangle)

cost is 0 if first argument is outside the [0, 255] range

`sspr()` 2*`n` `n` is the number of pixels drawn, including transparent pixels (width × height of the destination rectangle)
`rect()` 2*`max(1,2*ceil(a/4)) + 2*max(0,2*ceil(b/2-1))`

Where:

• `w,h = abs(x2-x1),abs(y2-y1) `
• `a,b = max(w,h),min(w,h)`
`rectfill()` 2*`max(1,flr(n/16))` `n` is the number of pixels drawn (width × height)
`circ()` 4`+n*8` warning: that formula is incomplete for clipped circles
`circfill()` 2*`n*flr((n+9)/4)` warning: that formula is incomplete for clipped circles
`line()` 2*`ceil(n/2)` `n` is the number of pixels drawn; there is an additional cost of `1` if at least one pixel had to be clipped
`map()` / `mapdraw()` 2*`max(1,n*64)` `n` is the number of sprites rendered; only cells that are not zero in the map are considered
`music()` `32` no cost if no argument
`sfx()` `32` no cost if no argument
`memcpy()` 2*(`n+1)` `n` is the number of bytes copied
`memset()` 2*`max(1,ceil(n/2))` `n` is the number of bytes set
cstore() 2*max(1, n*64) n is the number of bytes stored.
`reload()` 2*`max(1,n*8)` `n` is the number of bytes reloaded
`btn()` `8` no cost if no argument
`btnp()` `8` no cost if no argument
`rnd()` `8`
srand() 16
`sqrt()` `48` only `32` if argument is zero
`x^y` 16`*(n+1)` `n` is the position of the last fractional bit in `y`; for instance, cost is `8` for any integer such as `y == 13`, and is `8*3` for `y == 1.25`
`stat()` `32`