multi-threading real time OS for ARM
I recently started looking into ARM assembly language and got curious how I could program a controller or a raspberry pi for some, or even, many tasks that would run concurrently and under control over some kind of supervising task.
I guess for this I would need a Real Time OS or something like this that is well known and works well.
Are there any suggestions of what i could look into.
I could eventually also program in C or C++ if necessary (or useful).
FreeTOS is what is on the ESP series, not sure about Arduino and Raspberry Pi is Linux based. In the beginning Unix was Uni(one) and derived from Multics. I spent 15 years working with a Multics clone, very powerful OS.
"Don't tell people how to do things. Tell them what to do and let them surprise you with their results.” - G.S. Patton, Gen. USA
"Never wrestle with a pig....the pig loves it and you end up covered in mud..." anon
The truth is i am also wondering whether writing a multi-processing solution on an ARM processor makes sense.
Perhaps it makes much more sense to get my feet wet with an (open) FPGA dev board -- although the learning curve would be significantly more -- perhaps, its worth it.
-- lots of open question would then exist -- how to wrap an FPGA based capability into a (accelerated) C library -- how to link it to a large C/C++ program, etc.
Working with FPGAs is fun. I worked for Altera (now Intel) on the FPGA device design and tweaks to the tool chain. Altera tools are free, but I hear many people prefer the Xilinix tools, also free.
I have a couple Cyclone boards @ ~$200 at the time. Nice boards, all the SoC features and free IP in the Cyclone devices, but still pricey for use across a lot of projects.
The Teensy boards look nice, v4.1 is $30-ish, 600MHz ARM M7. I've got one on back order, I want to checkout the open source tools. I'm waiting to see the price on the TinyFPGA EX.
Tool support is key, the big boys bring a lot of capability but as you say the learning curve can be steep.
Could you outline how a setup woudl look like if i want to create an accelerated C library which interfaces with an FPGA board.
My overall ambitious game plan is to interface between Prolog and an FPGA -- and to define some foreign prolog predicates via the Prologs foreign language interface that would in effect be processed via c or c++ within an FPGA.
Prolog runs on Linux x86, M1 as well as ARM platforms such as on Rasperri PI -- my choice would probably be either Linux x86 or ARM -- with a slight preference for ARM .. but that really depends on which architecture an FPGA is best interfaces ...
Any thoughts would be much appreciated,
This is very cool project. but I do not know Prolog. Here are my thoughts, it's kind of long.
As you say you have something running either C or a Prolog interpreter, and a FPGA based design.
Does communication between Prolog/C and the hardware design have constraints, real time, low latency, high bandwidth?
If the Prolog/C is running on an external device you'll need to design how they communicate. Ethernet and even USB have latency issues that have to be managed.
Since you mention real time above I would see if an SoC FPGA is the starting point. The ARM and the FPGA design sit next to each other on the same device and the connection can be anything you like.
I'm assuming you are going to start with an existing dev board. You'll need some idea of the size of the FPGA design to be able to select a board. FPGA tools really like to eliminate what it thinks is unused logic so this can be tricky to get a good estimate without a full design.
I have basic questions about the Prolog execution environment if you go that route.
- How much memory does the interpreter need?
- Does it expect a file system for temp files etc?
- How fast does the processor need to run the interpreter to keep up with your hardware?
- Does Prolog have support for embedded programing like concepts like volatile memory mapped registers and the like?
If the execution environment is running C same questions I guess about size and speed.
What is your h/w description language experience? Have you used Verilog or VHDL, or OpenCL HW?
- I have the most experience with Verilog, little with VHDL almost none with OpenCL for H/w.
- I have heard cool things about OpenCL h/w but I have never used it.
I'd start with those bounds to get it rolling.
re: Does communication between Prolog/C and the hardware design have constraints, real time, low latency, high bandwidth?
This should be as fast as possible -- the idea is to have certain compute that is often called from Prolog, and performed really fast.
From Prolog's perspective its essentially an in-memory database, with a query facility accessible by new built-in functions (so called external predicates)
Prolog's VM is written in C, and it loads an external library into its own memory space.
re: Soc FPGA with ARM
So from your comments it seems that the FPGA should have access to main memory, and essentially do its compute over data in memory. Not sure if this is an efficient use of an FPGA though.
Perhaps the FPGA itself should be the database, somehow, where certain search patterns between stored data encoded on the FPGA to provide fast recurring computes , given some initial values.
Which raises the question how large a database an FPGA can be.
Perhaps, the FPGA is only a portion of the database one that doesn't often change, but that is needed for computation some of which could occur off FPGA and some on FPGA
I see this is all half-baked in my mind and needs much more thinking ...
Getting to your questions, here's another angle to consider.
This describes something similar to my board, this is from 2013, at time the board was $250.00. The Cyclone V SX is still a useful part so there will be boards out there, if not this exact one. This is just an example because it has two useful pictures:
This is a kitchen sink eval board. It does boot linux which is nice. But it has a bunch of stuff on the board, audio, IR send/recv and temperature sensors, the HSMC is their 'cape' interface. 1GB of RAM for the FPGA, 1GB of RAM for the ARM A9.
Anyway if you scoll down you'll see an image "SoCKit Block Diagram". They draw the device in two halves.
The HPS, hard processor system, this is where the ARM(s) are located.
The other side is the FPGA portion (aka fabric), logic goes here and that logic can include 'soft processors' if you want. You can customize anything in the FPGA side but not the HPS.
re: memory access
There is a DDR3 interface on both sides of this device. So ARM/FPGA can access memory at the same time. The FPGA DDR3 is optional. And there is also embedded RAM on chip in the FPGA side.
re: comms between ARM and FPGA
The two halves communicate through an interface that you can design to meet latency/bandwidth goals. The tools let you specify a custom protocol, or you can pick an existing, like AXI or whatever.
that's the thing, they design these devices to be so flexible, and so many choices and options.
if I ever get my teensy board i'll have 1st hand knowledge of the tools and the boards to share, right now I don't have a feel for what fits or what does not.
1GB of RAM for the FPGA, 1GB of RAM for the ARM A9.
Its unclear to me if 1GB RAM can be shared, or each half strictly gets its own dedicated and non-sharable RAM ...
If its not shared, then, i guess, the ARM side would need to somehow cause memory contents on its side to "mirror" on the FPGA side ... either continuously, or during a kind of a setup step (which would not be ideal for my purpose).
Ideally, its shared RAM with some synchronization going on -- i.e. when one side changes contents the other waits before writing into a same location.
For key word search use "HPS-FPGA Bridge" This video seems to have an over view in the first half, it dives right into an example in the second half so it will be a big info hose if you have not seen the intel tools before.
I came across this while I was poking around.
I wonder if there is a published application close enough to your intent that you can use as a template?
Great. Thank you for the further info:
Yes this is all new to me ...
I guess, intuitively speaking, the FPGA fabric should appear like an IC to the HPS/ARM part ... this means there should be something like addressable input and output pins, some kind of clock, and some kind of clocked behavior -- which is what the IC does ...
In fact the fabric could appear like many ICs ... each with its own clock -- while all clocks are synchronized by an ueber clock
Perhaps a technical question:
Considering that an FPGA is an inherent limited amount of "logic gates" --- is it possible to implement a virtual FPGA that redoes itself at high speed (and if so, at what speed?).
Suppose i want to create a network of logic gates that is larger than the FPGA ...
As a software engineer, I would think that it could perhaps work like "page fault" -- when the FPGA tries to send a signal to an off-FPGA circuit item -- an interrupt happens -- and a *selected* part (page) of the FPGA is redone to include the circuit part needed to complete the logic signal propagating.
If something like this exists then the interconnection between paged IC parts would need very careful planning to ensure that there is no "thrashing" ...
I guess perhaps, when it comes to such "pages" the FPGA behaves more like a serial processor than parallel logic gates ...
Don't know if this at all makes sense in logic hardware 🙂
"Considering that an FPGA is an inherent limited amount of "logic gates" --- is it possible to implement a virtual FPGA that redoes itself at high speed (and if so, at what speed?)."
You've hit on a very interesting topic. Intel calls this 'partial reconfiguration'. (PR)
As you state: part of the FPGA is kept alive while the other parts are reprogrammed.
From the tools perspective this is wickedly hard to do for a number of arcane reasons caused by the heterogeneous physical design of the fabric. Not all regions have the same fabric resources, and so not all designs can be placed arbitrarily in the region allocated for PR.
"If something like this exists then the interconnection between paged IC parts would need very careful planning to ensure that there is no "thrashing" ..."
You are right, your design must also comprehend PR since you are messing with clocks, you have to shutdown and startup gracefully. And of course the design must be able to keep it's self and any attached sensitive peripherals alive while the reprogramming happens.
Re-programing times are dependent on the fabric size and the size of the region you are reprogramming but it is common to talk about 1-2 seconds for a full reprogram. It can be done faster, it takes some planning though.
I did see some app notes on partial reconfig on the intel site. I dont know if Intel ever got PR to the point that it was turn key.
These guys were a head of the pack for a while making good progress. They are not in business any more.
Thanks, very interesting ... -- it doesn't bode well -- with them out of business 🙂
An additional route I started to wonder about is neuromorphic computing ...
Intel has (had?) a chip and there is now a company Mythic that seems to have created such hardware too ...
The concept sounds interesting -- and I am also wondering if they have a way to reconfigure partially ... perhaps its easier since the hardware is already dedicated to specific kinds of circuits.
hmmm, why do i think it's not a coincidence you mention mythic.
You had me going, nicely played. 😀
I have no relationship with Mythic -- if this is what you are thinking.
But, I would want to try to get my hand on their hardware if its possible, including cost wise and such an approach could be relevant.
There is also intel who have a neuromorphic chip -- but, I think, its only available for university research right now.
I am trying to understand the spectrum of options and see what could be feasible for what I have in mind.
And, i am usure if the size limit of FPGA's or neuromorphic computers, is essentially putting me back to looking at a light weight C or Assembly based multi-threading (multi-core) approach linked to Prolog.
Indeed, having noticed your deep experience with FPGAs and acceleration I wanted to mention the whole spectrum of options I am considering to possibly get further feedback from your experience ...