nemoBasic
© nemo 1989-2023
This page documents the Bugfixes and new Syntax that nemoBasic features. It draws upon Basic builds from Acorn, Pace and RISCOS Ltd over 30 years, plus many versions I’ve created since RO3 including DBasic and BADBasic (that’s Basic Assembler Dialects, not modesty).
There’s an awful lot of stuff here, now I see it written down, but my favourites are:
Error CASE
Truthy CASE
Unicode String Recognition
Array Arrow Operator
I’ll get the boring bugfixes out of the way first (of which there are many), and then get onto the new features.
Bugfixes
Many of Basic’s bugs are very old, and the worst of them dates back to a misdesign in the earliest versions of 6502 Basic. Fixing that required a tiny change to memory consumption, which means in extremis a very tightly-fitted WimpSlot would be insufficient for nemoBasic to run a Basic program that would otherwise be 100% compatible. This is the reason for the change of name – nemoBasic is not a drop-in replacement for BBC Basic.
These are the BBC Basic bugs that are fixed:
ELSE in strings
If the ELSE token appears in a string literal after IF, it was mistaken for ELSE.
0^.5
Is zero, not “Logarithm range”.
WHEN3o,4
Any unconsumed postfix to a WHEN value was not only not faulted, it masked all following values on the line.
who+=1
Throws “No such var” not “Mistake”
TIME $A%
Does not get crunched to TIME$ A%.
FOR?a
Faulted to prevent NEXT corrupting adjacent memory, which it has done since the Beeb.
ACS is not WHILE
WHILEFALSE used to mistake ACS for an embedded WHILE.
LIST of NEXT,,
List understands NEXT commas.
Images
nemoBasic can interact with imagefs files.
QUIT value
Never quite worked correctly.
REPORT$
Doesn’t corrupt memory in extremis.
=$memory
Won’t corrupt memory when trying to read an overlong string.
CALL/USR
Prevented from accessing zeropage.
Assembler = operator
Now actually works by design, instead of by a series of coincidences (in all other Basic builds it can call a number of wrong operators).
ALIGN
Has always been broken when using O%.
Assembler &who$
Unknown strings in assembly aren’t treated as integers!
A$=+"wot?"
Unary plus is no longer legal for strings.
SWAP local(),global()
SWAP is prevented from breaking everything.
real=INTverybigreal
INT works on large reals.
INKEY
Returns sensible results when scanning a range of keys.
Bitshifts are same precedence as power
Strictly speaking incompatible – 1<<8+1 is now 257 not 512. This makes A+B<<8+C<<16+D<<24 do what you’d want (i.e. not throw “Unknown or missing variable”, D’oh). This also fixes the frankly ridiculous:
x=0
IF x = 1<<31 THEN
PRINT "What?"
ENDIF
(and try setting x=1 in the first line and see what you get)
BGET at EOF
Now returns -2 rather than 254, which is easier to spot and more in keeping with the Beeb’s intention when it did the same undocumented thing. And on that subject...
INPUT# doesn’t invent a byte
For the same obscure Beeb MOS reason, INPUT# would fail to notice a missing final byte and silently substitute 254 instead. Bad.
POINT(a+b)*c,d
Tokenisation of the above would get confused.
POS & VPOS mistreated OS_Byte
OS_Byte is not limited to byte-sized results (or parameters, or reason code), so don’t limit the result.
$ allowed to access scratch space
$addr should be allowed to do what !addr does (but zeropage still not allowed).
LOCAL doesn’t work
This is such a big one it requires its own section
LOCAL has always been completely broken – both in the explicit sense of LOCALa,b,c and the implicit sense of DEFPROCa(b,c).
BBC Basic requires that the memory location that holds the value of a variable does not change over time, for many reasons including caching and the inscrutability of its own stack. Therefore, in order for LOCAL to be implemented, the only recourse is to temporarily move the ‘global’ value of the variable somewhere safe, so that the ‘local’ value can be put in the same memory. When ENDPROC or = is reached, the saved value is returned to its memory location
...unless there was an error. If there was an error, then every LOCAL since the most recent ONERROR is simply thrown away. The global value – the value the programmer wanted to protect – is discarded, and the temporary local value becomes the new permanent global value.
It did this on the BBC Micro. It still does it in 2023 in BBC Basic. It was always wrong, but a slight inconvenience in a 5KB program on a Beeb is major architectural problem on a 1MB multitasking desktop application, which has to deal with errors all the time, usually not of its making.
nemoBasic solves this problem: LOCAL always works correctly. This leads to further enhancements, as once the interpreter has a mechanism for ensuring that something that has been temporarily changed can definitely be restored no matter what happens, all kinds of new functionality can be implemented.
Therefore a number of new features build on this one bugfix. It was caused, in 6502 and RISC OS Basics, by the inability of the interpreter to unwind its own stack. Perversely, it could only understand the stack contents contextually... and when the error handler is called, that context has been lost.
So nemoBasic ensures that it can always unwind its stack. This requires a few extra bytes per function call (PROC & FN), and a very few other places. This additional memory usage could cause a very tightly-fitted WimpSlot to run out of memory, so nemoBasic is not in general a drop-in replacement for BBC Basic.
New Syntax
Although nemoBasic has a few added keywords, I have tried to introduce new functionality without new keywords. Basic compressors and editors simply don’t know about new keywords, whereas new syntax using existing keywords is usually already possible.
INPUTEDIT
My OS_ReadLine32 supports editing the contents of the buffer, so nemoBasic enables that with the EDIT modifier.
=VOICES
Returns the number of voices.
=VOICE[$](chan%)
Returns the voice attached to a channel.
VOICEchan%,voice%
Voice number can be used instead of name.
=SOUND
Returns the Sound System Number (the old sound suppression MosVar) used to select between multiple sound systems in my SoundChannels.
SOUNDa[,b[,c[,d[,e]]]]
SOUND takes between one and five parameters, adding SOUNDMODE setting, direct use of the old speech synthesis API, plus Sound System selection (SoundChannels).
=STEREO(chan%)
Returns the stereo position.
=LINE
Returns the current fractional line number – which GOTO and GOSUB support. This allows GOTO, for example, to jump into the middle of a line... as if it wasn’t mad enough already.
=READ[$]
The functional version of READ. e.g. PRINT READ$
=RESTORE
Returns the current DATA pointer as a fractional line number, which RESTORE n supports.
=COLOUR(r,g,b)
As Brandy, constructs a colour (palette) word from three parameters, with clamping.
GCOL[a,]colour;
With a semicolon the colour is interpreted as a 24bit colour (palette) word.
LOCAL ERR/ERL/REPORT/TRACE/RND/COUNT/WIDTH
Many more things can be temporarily protected from change. Note that the LOCAL fix above extends to these values too.
=STRING$(address,reason[,length])
For reading many kinds of string direct from memory:
BPUT(addr%,type,value) and =BGET(addr%,type)
Serialisation primitives allow data to be read and written to memory in many formats, including null-terminated string, ctrl-terminated, Pascal string, reverse Pascal string, 40bit float, 16bit float, IEEE single, double and extended, bfloat16, uint8, sint8, uint16, sint16, uint16BE, sint16BE, uint24, sint24, uint24BE, sint24BE, int32, int32BE. addr% is incremented appropriately, so must be a simple variable.
BPUT(#handle%,type,value) and =BGET(#handle%,type)
Serialisation to and from a file.
=SYS[$]
Converts to and from SWI names.
Hex, ASCII85, RFC1924, Base64 and UTF-8 strings
This is going to change. A$=<414243> is the current syntax for hex, with U for Unicodes as UTF-8; <~...~> for ASCII85; </.../> for 1924; <=...=> for Base64. However, there are some tokenisation problems (eg <CDEF> becomes <CDEF>) which is easiest to fix using quotes, so these are likely to change to A$=<"414243"> and so on.
=IF(boolean,true,false)
The functional version of IF is equivalent to C’s bool?true:false construction, and only one of true/false is evaluated, which is important if they contain FNs.
=DEF(var)
Allows variable, PROC and FN existence to be checked at runtime.
Dyadic bar operator
a|b behaves like a!b.
Monadic logic operators
a AND= b, a EOR= b, a OR= b.
Numerics can be any base
base_digits augments &hex and %binary. So 8_377 is 255.
STR$^n
The binary converter.
STR$(n,base)
The any base converter.
Extended @% & @$
Lowercase exponent, leading/trailing currency character (any 8-bit, over 30 Unicodes) and accounting minus options. @$ is a synonym when setting, and returns the text form.
Multiline constructions
Many things can be split over multiple lines, including array initialisers (making them useful for more than the most trivial arrays), PROC/FN definitions (so their parameters can be documented with inline REM) and expression evaluations (so big sums can be many lines long).
CALL emulation
MOS emulation is retained, which was always avoided by including parameters (i.e. CALLaddr,x) but can be explicitly avoided without parameters with a trailing semicolon.
Safer static ints
A%-Z% are zeroed on entry if started with -quit, but not with -chain.
PRINT#zero
If the handle is zero, PRINT# behaves as PRINT.
=INT(val,type)
Adds all rounding types – floor, ceiling, up, down.
LEFT$/MID$/RIGHT$
Parameter ordering is relaxed so that the string can be the second parameter – a significant speed optimisation.
Assembler ROL#
Assembler allows ROL# as a shift. Long live the 6502.
Assembler % operator
Reserves an amount of memory, like AAsm/ObjAsm.
Assembler tokenisation
The assembler copes with constructions such as MOVE,F and BGETarget that used to fail, and also allows SYS as a synonym for SWI.
Assemble to file
If OPT4 (offset) is on but O% is a byte then it is not a memory address: If OPT8 (limit) is on then L% is relative to P%. If O%=0 then nothing is written but labels are defined and the syntax checked. Else the assembly is written to file handle O%. ALIGN and the new % operator write zeros.
=EXT[#]a$
Without the # returns various built-in symbol values, such as named OPT bits (LIST, ERROR, OFFSET, LIMIT, V5 and NOSYNC). Otherwise returns the file length of the named file or -1.
=EXP a$
Expands (do you see?) the C-style string, replacing ‘backslash’ sequences abdeflnrtv with control codes (\l is CR,LF), \x## with any byte, and \u#### & \U00###### with a UTF-8 encoded Unicode. Also allows C++23 expansions such as \o{##}, \x{###}, \u{###}, plus \U{name} (given ServiceInternational,288 support).
=INSTR(arr(),val)
Finds the index of a value in a one-dimensional array.
VDU
More polymorphic, accepting strings, and apostrophe for NewLine. e.g. VDU''34"Like this"34'''
*nemoBasic -this
Ignored within a nemoBasic program, but causes the BBC Basic interpreter to switch to nemoBasic.
=@thing
Returns the address of the thing – variable, PROC, FN, array or string control block.
ANDWHEN/GOTOWHEN
Perfect for Duff’s Device, allows a WHEN statement to jump to another, later WHEN or OTHERWISE, sharing code.
Ranged WHEN
WHEN<0, WHEN4TO6 and so on. Currently switched off because incompatible with HexStrings.
ERRORCALL
When a PROC or FN checks its parameters and discovers it has been called incorrectly, it may ERROR with a helpful message. BUT though the REPORT$ describes the problem, the ERL simply shows the line number that the problem was detected, not where the problem was caused. The CALL modifier (only within PROC, FN or GOSUB) causes the ERROR to be thrown from the calling line, identifying the location of the mistake.
Truthy strings
Basic has always allowed IF x THEN, taking any non-zero x as truthy. nemoBasic does the same with strings, taking any non-null string as truthy.
PROC(name$)(params)
PROCs and FNs can be called by a variable (or an address – see @).
OPT128
Disables SynchroniseCodeAreas to greatly speed up data assembly.
CMM
CMN should not have been called CMN, because it’s not the same thing as MVN. CMM=Compare MINUS.
Assembler improvements
All possible immediate constants are supported (e.g. MOV R0,#-1). Non-standard encodings are supported (e.g. MOV R0,#4,2). % reserves bytes. = and & can take multiple parameters.
ASC! and ASC%
Allow a FourCC to be converted to integer simply, checking for illegal strings – IF !q%=ASC!"Four" THEN.
FORLOCAL
The index variable of a FOR loop can be made LOCAL, so that it reverts to its previous value after the NEXT.
FORSTEP[var]= and FORNEXT[var]=
FOR/NEXT loops are mutable.
=EVAL(flags,str$)
Allows various features of the expression evaluator to be disabled when evaluating a string.
=SPCnum
Equivalent to STRING$(num," ").
=SPCa$
Strips leading and trailing tabs and spaces from the string. So SPCGET$#F% is really useful.
READarr()
Reads an entire array at a time.
DIM addr filename$ [AUTO|OF filetype][TO len]
Load a file into memory having DIMed the appropriate amount of space. If AUTO or OF filetype is included, then Sprite and Module files (if it is) are handled specially – prepending their lengths as required by various APIs. If TO is included, the actual file length is returned to that variable.
REV$()
One of the few new operators, REV$ reverses strings in various ways: REV$(a$) does the whole string; REV$(a$,start) does the end; REV$(a$,-end) does the start; REV$(a$,start,count) does a section; REV$(a$," ") reverses every word (separated by spaces in that case). So REV$(REV$(a$," ")) reverses the order of the words in the sentence!
BEEP
Does a system beep.
MANDEL i,r
Resurrected from the mists of time, BBC Basic should always have a Mandelbrot generator built-in (doesn’t use the same token as originally because that was reused).
Try/Catch
This gets a section of its own because it’s a significant upgrade to the language and one of my favourite bits of syntax. Many languages have the concept of localised error handling by wrapping the fragile code in a try block and then catching any error. By comparison, error handling in BBC Basic has always been a bit of a sledgehammer.
But nemoBasic has try/catch, using CASE:
CASE ERROR OF
fragile code goes here
as much as you want
WHEN errornumber1
deal with that one error
WHEN errornumber2
handle that error
OTHERWISE
all other errors, optionally
ENDCASE
If the CASEERROR doesn’t have an OTHERWISE, then any unhandled error is thrown from its line as normal. But errors for which there is a catch – a WHEN in this syntax – are simply not triggered as errors. They’re handled just like any other flow control. Within the WHEN and OTHERWISE it is also possible to use ERROR with no parameters, which throws the original error from the originating line as if it hadn’t been caught at all.
This makes it MUCH easier to deal with expected errors – “I expect this might go wrong, so I’ll build handling in here”, in context. It is difficult to do this otherwise in BBC Basic. nemoBasic makes it terribly easy. Lovely syntax.
ELSIF
Another section for a significant change. Many languages have some version of ELSIF – the ability to have a multi-option but flat IF:
IF boolean
ELSIF boolean
ELSIF boolean
ELSE
ENDIF
This is less straightforward in BBC Basic because there is no ELSIF – ELSEIF is two statements and, when the IF is multiline, each requires an additional ENDIF. Hence:
IF boolean THEN
ELSEIF boolean THEN
ELSEIF boolean THEN
ELSE
ENDIF
ENDIF
ENDIF
It can be tricky to keep track of those ENDIFs, and if the construction is within a PROC/FN with early termination, you may never notice your flow control is wrong.
One might try to use CASETRUE to produce the same effect:
CASE TRUE OF
WHEN boolean
WHEN boolean
WHEN boolean
OTHERWISE
ENDCASE
but this is not the same. IF statements have a “true” and a “false” block... but they’re misnamed. Really they’re a “not false” and “false” block – the second is only executed if the parameter is zero. Under ALL other circumstances the first block is executed. So that shouldn’t be called a “true” block, but to borrow Douglass Crockford’s term, a “truthy” block.
But CASE TRUE OF simple compares every WHEN with the value -1, and will only match one that is exactly -1. This is totally unlike IF. What is needed for a flat multiline IF equivalent with ELSIF is a “truthy CASE”. This is achieved in nemoBasic simply by omitting the CASE parameter:
CASE OF
WHENtruthy
WHENtruthy
WHENtruthy
OTHERWISE
ENDCASE
This is exactly equivalent to the IF/ELSIF/ELSIF/ELSE/ENDIF case.
Array Arrow Operator
A significant improvement to array handling, the array operator => iterates over every element of an array (of any number of dimensions) evaluating the expression for each element.
Importantly, during the iteration any reference to this array resolves to the element being iterated. So whereas BBC Basic’s whole-array handling is faked, highly restricted and buggy, the arrow operator can do anything to a whole array that you can do to a single variable.
For example, a%()=RND(100) evaluates a single random number and then stores it in every member of the array, but a%()=>RND(100) evaluates a random number for every element of the array.
Although one can write a%()=a%()*2, Basic’s array reference handling can’t cope with a%()=5+2*a%(). But the arrow operator has no problem: a%()=>5+2*a%(). One can even call FNs; pass in the element being iterated; or the FN can use the array reference in its code. It is even possible to iterate an array with the arrow operator from within an FN called by the arrow operator for that array!
Unicode String Recognition
String operators such as LEFT$, MID$, RIGHT$, INSTR$, ASC, LEN, CHR$ are pretty useless when faced with UTF-8 encoded Unicode text, because their parameters are bytes or byte offsets. e.g. LEFT$(a$,5) returns the first five bytes of a$, not the first five characters. But when dealing with Unicode text, byte offsets are highly inconvenient. If only they could recognise they were dealing with UTF-8!
nemoBasic implements Unicode String Recognition, which is switched on for an operator by including the USR modifier (see what I did there?). For example, whereas the number of bytes in a string is LEN a$, the number of graphemes in the string is LEN USR a$.
To return the number of graphemes to the left of "A" is INSTR(USR str$,"A"). The first three graphemes of a string is LEFT$(USR a$,3), and so on. Whereas ASC a$ returns the first byte of the string, ASC USR a$ returns the first Unicode.
Optimisations
There are many throughout the code, and the smaller ones aren’t bracketed.
LVAR
No longer requires 16 bytes of storage. This frees up two words for other purposes. Reuses the new @$ code.
Strings
Much string code was rewritten to avoid copying when unneccessary and to use word copies whenever possible.
SYS
Uses CallASWI rather than a fragile code veneer, again freeing up workspace.
PLOT
Plot was just weird.
Unfixed bugs
There are several bugs that are difficult or painful to fix.
REM uh oh…
If a REM statement (or ; comment in assembly) ends in the ellipsis character (in Latin1 – character code &8C in general) then it messes up any multiline IF/THEN/ELSE/ENDIF that spans it.
This is a serious bug, but can only be fixed in the REM case by searching EVERY such line for a REM (not in a string!) which is a small overhead, but in the assembly case requires checking every character of every line looking for the [ that enables the semicolon. In the assembly case, the byte would be detokenised as THEN, and is indistinguishable from a textual THEN that would not cause a problem. Not nice.
This also affects byte code &CA inside CASE statements. Same problem, same expensive solution.
DATA is weird
The parsing rules for DATA statements are entirely different from all other Basic lines, and can behave very strangely, including mistaking top-bit-set characters for pseudo-variables like TIME and calling FNs. The polymorphic nature of DATA means this can’t be changed retrospectively.
Array References are faked and wrong
f=3.9:a%=10:a%=a%*f:PRINTa% gives “39” as you’d expect, but f=3.9:a%(0)=10:a%()=a%()*f:PRINTa%(0) gives “30”... which is silly. This affects divide too – f=3.9:a%(0)=39:a%()=a%()/f:PRINTa%(0) gives “13”!
This can’t be changed retrospectively because code may be relying on this bizarre result (which is caused by applying the integer nature of a%() to the real f before doing the calculation – entirely backwards).
RETURN parameters are not inheritable
This is an obscure one, but you’d be forgiven for thinking that these two lines are completely equivalent:
temp$=FNbar(param$):PROCfoo(temp$)
PROCfoo(FNbar(param$))
But they are not, if DEFFNbar(RETURN a$) but DEFPROCfoo(z$) – the non-RETURN of PROCfoo outranks the RETURN of FNbar. This is despite the FN being evaluated before the PROC is called. The implicit LOCAL of the PROC has already occurred before the FN is executed. This is unpleasant to fix, and once again there may be code that only works because of this strange, undocumented, implementation-specific, emergent behaviour.
Multiplies
I tried using UMULL instead of multiple MULs, but that went 45% slower under VirtualRPC (whose MULs are unnaturally fast) so I keep that one switched off.
Works in progress
I’m in the process of merging the Basic Assembler Dialects system from BADBasic, which makes the assembler of nemoBasic massively extensible. This allows the precise target ARM version and instruction repertoire to be specified, to add macros that are familiar from AAsm and ObjAsm, or to switch to 6502, 6809, Sunplus or SuperH assembly... and even to intermix these within the same source.
I have a fully-formed plan for Long Strings in nemoBasic, which is necessary really for UTF-8 handling, as the strings can get so large.
The object-oriented features of OOBasic greatly influenced the syntax of Basalt, but I was never happy with the memory management inside BBC Basic. This will change with the Long Strings work as better garbage collection is required for that.
I have some code to make INSTR do regexp and count occurrences. I’ve made a wide-ranging change to the expression evaluator with a view to returning constness as well as value, allowing constant expressions to be cached rather than repeatedly parsed. This might be of use to CRUNCH, who knows.
There’s lots of changes in anticipation of a multi-tasking IDE/debugger with breakpoints, watchpoints and single-stepping. All untested and untried as yet.
There’s an implementation of dicts – hashmaps, associative arrays – created by DIMdict(-1), which involves the creation of a new class of ‘variable’ – Atoms, used as keys for the dict. Dicts are polymorphic, which can look a bit odd in a language whose only truly polymorphic element is PRINT. i.e. a%("not an int")="also not an int"
FOR iterators aren’t complete yet. The array arrow operator stole much of their thunder, and I haven’t yet implemented FOR c OF a$ when a$="".
“Give us your Basic, nemo”
nemoBasic is usually partly broken, because it contains various half-finished ideas and things I’m planning to go back to. However, literally all the above fixes and new features are bracketed by build switches to make it easier to port them individually. However...
The greatest impediment is the state of official sources. nemoBasic was not created from source, so every symbol, function, variable, table, or address label is my own invention (well, the obvious ones are identical – there’s MATCH and IFLT and so on, but Match_parens and Float_Internal5 are unique to my code).
Plus the formatting and documentation is extremely different. The original ARM BBC Basic source was staggeringly terse. Mine is heavily commented, or I’d not be able to understand it, never mind change it.
So the most realistic use of this source is to select a desired range of features and fixes and build a custom module. But we shall see.
nemo