Status of Raspberry Pi native support

NinjaCowboy · 25450

nikos

  • Senior Member
  • ****
    • Posts: 374
    • Karma: +71/-3
    • aspireos
Reply #15 on: January 19, 2019, 05:57:47 PM
If there is one thing that we need for the Pi port, it is the Amiga 68k stuff running.
 


NinjaCowboy

  • Junior Member
  • **
    • Posts: 66
    • Karma: +18/-0
Reply #16 on: January 20, 2019, 07:53:36 AM
Awesome. Looks like some nice progress. Why is big endian ARM targeted instead of little endian, which ARM traditionally uses? Is it to make it easier to emulate software for the Motorola 68k, which is also big endian?
Exactly!  Now you get it!

It's reasons like this that MorphOS and AmigaOS 4 can JIT 68k code quickly and seamlessly while AROS x86 stumbles along with a UAE derivative emulator.  That and that many open source projects use AmosPro, AmiBlitz and AmigaE or other programming languages that don't run with little endian CPU support.

What does the PowerPC version of AROS use for emulation? While it's easier to use the same endianness, I doubt it's that big of a performance barrier. The only instructions affected are those that read and write to memory, and those are slow anyway because memory speed is always a bottleneck in modern computers. x86 also has fast byte swapping instructions that can be used.



wawa

  • Senior Member
  • ****
    • Posts: 265
    • Karma: +55/-0
Reply #17 on: January 20, 2019, 08:21:28 AM
aros has no tranparent emulation like morphos or os4 at this time. so far it is janus uae afaik, a version of winuae, with a mui frontend, (coorect me if im wrong). michal is inclined to add that transparent emu  layer, on pi since it would make sense on big endian host.
btw. are you compiling/building armeb (big endian taget)?
if you share your mail with me (pm?) i can invite you to the slack channel michal is monitoring. also you could get there other assistance what concerns aros concepts and details from experienced developers who might not post here.



Samurai_Crow

  • Junior Member
  • **
    • Posts: 88
    • Karma: +32/-0
  • Hobby coder
Reply #18 on: January 20, 2019, 12:47:13 PM
What does the PowerPC version of AROS use for emulation? While it's easier to use the same endianness, I doubt it's that big of a performance barrier. The only instructions affected are those that read and write to memory, and those are slow anyway because memory speed is always a bottleneck in modern computers. x86 also has fast byte swapping instructions that can be used.
To use a BSwap opcode on 486+ the JIT would have to be specially written to use it or a big endian version of x86 AROS using a custom C backend in GCC would have to be written.  Either would be difficult to maintain and would be noticeably slower than regular AROS.

AMD64 has byte swapping load and store opcodes. But they still don't support the reads built upon the CISC style addressing modes present so those wouldn't be faster than BSwap.

Having a global big endian mode like ARM7 and MIPS use are definitely advantageous in these situations.



cdimauro

  • Member
  • ***
    • Posts: 164
    • Karma: +26/-1
Reply #19 on: January 20, 2019, 03:56:25 PM
AMD64 has no byte swapping load and store opcodes. If you refer to MOVBE, it was first introduced on Intel's Atoms, and then on subsequent Intel's processor. Later it was also implemented on AMD processors.

So, MOVBE isn't part of the standard AMD64 ISA, but only some processors have it (there's a CPU flag to check for its presence).

MOVBE is definitely MUCH better than BSWAP, because it allows to load and store BE data using any x86/x64 addressing modes, which is a vast improvement over the BSWAP. And performances gained much more benefits compared to the latter.



NinjaCowboy

  • Junior Member
  • **
    • Posts: 66
    • Karma: +18/-0
Reply #20 on: January 20, 2019, 05:49:34 PM
Well, I meant that you can still use any of your x86 load/store instructions within whatever addressing mode. The only difference would be that you'd insert a bswap/xchg instruction before storing and after loading. There's no reason to modify GCC or anything, as this would only occur in the code generated by the JIT in the m68k emulator.

Back on topic, I think making the Pi version big endian is quite interesting, since I don't think anyone has ever run a big endian OS on the Pi before.



mschulz

  • Moderator
  • Newbie
  • *****
    • Posts: 12
    • Karma: +80/-0
Reply #21 on: January 21, 2019, 07:55:36 AM
Quote
Well, I meant that you can still use any of your x86 load/store instructions within whatever addressing mode. The only difference would be that you'd insert a bswap/xchg instruction before storing and after loading. There's no reason to modify GCC or anything, as this would only occur in the code generated by the JIT in the m68k emulator.

No you can't. This has been discussed many many times on different forums already. Since any Amiga-like OS shares its all internals with all running programs you have to use the same endianness for both the OS and applications. Because of that you cannot make BigEndian m68k emulation layer for LittleEndian x86 AROS, because you never know what kind of access you are translating and if there is a need for endian swap or not. Besides your translated binaries would be lost if they would fetch data as multiplies of the type size (e.g. 4 UBYTEs fetched as one ULONG).



nikos

  • Senior Member
  • ****
    • Posts: 374
    • Karma: +71/-3
    • aspireos
Reply #22 on: January 22, 2019, 03:39:14 AM
Update again :)
Now USB works as should :)

https://www.patreon.com/posts/c-standard-and-24108999


dizzy

  • Junior Member
  • **
    • Posts: 59
    • Karma: +60/-0
    • YouTube channel
Reply #23 on: January 22, 2019, 05:47:30 AM
Great!  :) Could a well placed volatile make a difference how GCC treats things?
« Last Edit: January 22, 2019, 07:53:07 AM by dizzy »



NinjaCowboy

  • Junior Member
  • **
    • Posts: 66
    • Karma: +18/-0
Reply #24 on: January 22, 2019, 09:05:04 AM
Quote
Well, I meant that you can still use any of your x86 load/store instructions within whatever addressing mode. The only difference would be that you'd insert a bswap/xchg instruction before storing and after loading. There's no reason to modify GCC or anything, as this would only occur in the code generated by the JIT in the m68k emulator.

No you can't. This has been discussed many many times on different forums already. Since any Amiga-like OS shares its all internals with all running programs you have to use the same endianness for both the OS and applications. Because of that you cannot make BigEndian m68k emulation layer for LittleEndian x86 AROS, because you never know what kind of access you are translating and if there is a need for endian swap or not. Besides your translated binaries would be lost if they would fetch data as multiplies of the type size (e.g. 4 UBYTEs fetched as one ULONG).

Yeah, I thought about it more, and realized that wouldn't work in unioned fields that are read both in words and bytes. It wouldn't be a problem if the m68k code was emulated in isolation (like UAE), but it wouldn't work for memory that's accessed by both the native x86 and emulated m68k code.
« Last Edit: January 22, 2019, 10:10:41 AM by NinjaCowboy »



NinjaCowboy

  • Junior Member
  • **
    • Posts: 66
    • Karma: +18/-0
Reply #25 on: January 22, 2019, 09:59:04 AM
Congrats on getting Poseidon running! Making the pointer volatile should prevent the use of ldm. I think the best solution would be to use _attribute_((aligned(4))) where that buffer is defined in order to force the compiler to align it to a 4 byte boundary. Unaligned access is still a performance penalty even though the hardware supports it. If that's not possible (due to a packed struct or something where the field absolutely cannot be aligned) then volatile should work, or maybe a macro that assembles a long from individual bytes if you want to be more portable.
« Last Edit: January 22, 2019, 10:08:11 AM by NinjaCowboy »



mschulz

  • Moderator
  • Newbie
  • *****
    • Posts: 12
    • Karma: +80/-0
Reply #26 on: January 22, 2019, 12:00:52 PM
Quote
Making the pointer volatile should prevent the use of ldm.

Since preventing from fetch merge through ldm is just a side effect of volatile, I will avoid overusing it as much as possible. It's not what volatile is meant for.

Quote
I think the best solution would be to use _attribute_((aligned(4))) where that buffer is defined in order to force the compiler to align it to a 4 byte boundary

That would not help. Originally it has even better alignment since the memory area comes from AllocMem call, but unfortunately IFF is internally packet to 2 byte boundary so it wouldn't help at all.

Quote
or maybe a macro that assembles a long from individual bytes if you want to be more portable.

This is exactly what I did as you can read on my Patreon page :) Actually I have not used a macro but rather static inline function.



Argo

  • Junior Member
  • **
    • Posts: 75
    • Karma: +46/-0
  • Argo
    • Find me on the Fediverse
Reply #27 on: July 12, 2019, 12:09:55 AM
Ping!
Now that there is a new Pi out getting the shake down



mschulz

  • Moderator
  • Newbie
  • *****
    • Posts: 12
    • Karma: +80/-0
Reply #28 on: July 13, 2019, 07:44:59 AM
Quote
Now that there is a new Pi out getting the shake down

Hi! Sorry for no actual status reports. So far I have spent a lot of time fighting with AROS ABI for ARM. I'm at a stage where I have real executable files with all information I need inside, but the GNU's linker has some issues making a bootstrap binary from them. Apart from that I have fixed ARM crosscospiler build - the gcc 9.1, and at the moment I am adapting compiler flags in order to let the ARM port behave the same as x86/x86_64/m68k.

And yes, I have RasPi4 on my desktop. Good news there - it does not use the DWC2 OTG usb controller which brought me sleepless nights and headaches. Instead, they have there XHCI controller on one PCIe lane.



ntromans

  • Member
  • ***
    • Posts: 157
    • Karma: +50/-0
Reply #29 on: July 14, 2019, 02:40:04 PM
That really is fantastic news - over the last couple of weeks I've been looking at putting an RPi 4 system togeter to replace my aging x86 laptops (and it's going to be reasonably straighforward as there's an embarassment of screens, power supplies and, well, everything for those boards) and it's great to know I might have an alternative to having to run Raspian.

Cheers,
Nigel.