SMP Update?

trekiej · 13739

dizzy

  • Junior Member
  • **
    • Posts: 59
    • Karma: +60/-0
    • YouTube channel
Reply #45 on: January 09, 2019, 11:22:39 PM
If one wants program to be multi-threaded then it needs to be coded such a way that the program spawns sub tasks and gives them tasks to do. Most of AROS drivers spawn a handler task so in essence they are multi-threaded...

Even though a thread and a Amiga like task aren't equivalent they serve the same purpose and that is they allow the CPU to execute code. AROS and Amiga program environment are different to those that utilize memory protection e.g. Linux/Windows and how they perceive a thread.

It's up to the Kernel/Exec to schedule tasks among different cores. If one spawns subtask then they may or may not run simultaneous on different cores, I think they might be some flags to instruct the task to run only on certain core etc.   



magorium

  • Legendary Member
  • *****
    • Posts: 632
    • Karma: +62/-0
  • Convicted non contributor
Reply #46 on: January 10, 2019, 04:00:58 AM
If one wants program to be multi-threaded then it needs to be coded such a way that the program spawns sub tasks and gives them tasks to do. Most of AROS drivers spawn a handler task so in essence they are multi-threaded...
Perhaps that is where the misconception from trekiej originates from ?

Let's say for (a simplified) example i'm developing an application that plays video files. My application uses system libraries to decode the frames for me. The system library my application uses to decode those frames exposes this functionality for my application and is implemented in such a way that it decodes the frames using multiple threads.

Does that mean my application is now a multithreaded appplication ? Does that mean my application was "automatically" transformed into a multithreaded application ?

I would personally answer no to both of these questions because it was not the application itself that was 'transformed' into a mutlithreaded one rather the library that does it that way for the application.

On the other hand if i would say yes to either question then my answer would be that it was me the programmer who decided to make use of the library for which i knew it was dividing the workload between multiple threads/cores, and not the operating system that did it automagically for me.

I wonder if it could happen during compilation.
Uhm, i would see no other way than during compilation. But even then it is the programmer who decides how and when to make use of it.

There are special programming languages that were invented for developing/running code in parralllel (for you) so in that regards it can be considered as being done "automatically". But in that case it is still the developer who decides to use that programming language in the first place :)

On a sidenote, perhaps you can take a look at this OpenMP wiki article. Have a good look at the examples and see that it is (again) the programmer who is responsible for instructing the compiler to turn the code into something that uses multiple threads. The syntactic sugar for the preprocessor does not change that fact.

Also note that an operating system (its task scheduler) can choose to run an application on another thread/core as it so wishes, but that is not turning an application into a multithreaded one, rather an application that runs on a specifc thread/core. That are two distinct situations.

It's up to the Kernel/Exec to schedule tasks among different cores. If one spawns subtask then they may or may not run simultaneous on different cores, I think they might be some flags to instruct the task to run only on certain core etc.   
It should be the scheduler taking care of things, and imho which should even be able to toss tasks/threads around in case things get heated up. But afaik that requires another implementation. It really is out of my comfort-zone, so feel free to correct me on that.

For Windows for example you can choose to run your app into a different thread/core by setting the affinity. afaik for AROS you have to manually choose at which core to run the code. I would expect (i might be wrong there) that eventually the scheduler in AROS is able to figure out how to spread tasks automatically (and evenly) between cores/threads.

I would settle for having the OS (AROS) run at the one thread/core and applications in another (or multiple others). But i have no idea how much overhead that would cause and as such if it would make any sense. I have not experimented with it myself.
« Last Edit: January 10, 2019, 04:22:05 AM by magorium »



cdimauro

  • Member
  • ***
    • Posts: 164
    • Karma: +26/-1
Reply #47 on: January 10, 2019, 03:01:11 PM
I wonder if it could happen during compilation.
Only if the programmer instructs the compiler on how to "treat" particular piece(s) of the source.
I heard that BeOS could make all its programs multi-threaded automatically.
At most it's an urban legend: only programMERs can write multi-threaded code, and... manually. An o.s. cannot certainly do it, even automatically.
Isn't Crysis a prime example of this?

Crysis was specifically coded by programmers to use more threads/cores.



trekiej

  • Member
  • ***
    • Posts: 190
    • Karma: +5/-0
Reply #48 on: February 15, 2019, 12:47:15 AM
Bump.



hth313

  • Newbie
  • *
    • Posts: 15
    • Karma: +0/-0
Reply #49 on: February 16, 2019, 12:51:30 AM
What is the issue here? Is there a question?

Making parallel programs usually takes programmer actions. There are 2-3 reasons for this. First you often need some kind of synchronization mechanism between the tasks, to synchronize/lock access to shared resources or communicate when something interesting (to other tasks) has occurred. This typically needs to be done manually by the programmer. The second reason is that the compiler or language is often not smart enough to figure out how to distribute parallel work as it does not understand the algorithm or "shape" of the problem. It usually takes a human to figure out where to apply parallelism, sometimes by gut feeling or by trial and measuring typical scenarios. This can be because there may be an initial start cost, and doing it many times when there are much fewer execution units will introduce unnecessary overhead. A third reason is that sometimes even in a parallel capable language, the program may need to be expressed in way that makes it possible to parallelize.

Locking and synchronization is typically done in imperative (and object oriented) programming. It is very error prone. If you say it is simple, you either have a very simple problem or do not know what you are talking about. As an example, I read about a couple of experts on this topic that wrote a program and they really knew their stuff. Still, they created a very hard to find problem that took a couple of years before it showed up (unfortunately I forgot where I read about it). At the time I was doing this kind of programming myself, and thought that I can do it. Perhaps I could because I did not find any problems, or maybe I did not stress it long enough. However, coming back to that code a year or two later and I could not wrap my head around it anymore. That has happened to me twice...

This is one reason why functional programming is gaining today, it makes it easier to utilize parallelism (and multi-core) in a safer way.

There are languages that are naturally parallel as well, they typically may have a model that can be translated to multi-process evaluation by its run-time. Still, the programmer may need to express the program in a  way that works for it.

I remember trying this in a language called Parlog (which is a parallel variant of Prolog, a logic programming language). I had a lab problem that was tricky, figured out a way to solve it, only to be told that well it solved the problem, but it was not done in a parallel way. Back to drawing board...

Parallelism also comes up in operating systems, like AmigaOS and can be seen as a way to make it responsive and give a good flow. It allows for many things to happen at the "same" time.

When having few cores, a proper multi-tasking system and occasional use of in-application (programmer written) parallelism to improve responsiveness  is often the best you can hope for. To gain performance on real parallelism  you need many cores to make it really worthwhile.

To summarize, typically, it does not happen by itself in applications,  the programmer often need to think about it and do things.

But I still do not know what we are looking for here..?



trekiej

  • Member
  • ***
    • Posts: 190
    • Karma: +5/-0
Reply #50 on: February 16, 2019, 10:13:16 AM
Thanks for sharing.



hyperlogik

  • Newbie
  • *
    • Posts: 1
    • Karma: +0/-0
Reply #51 on: June 28, 2019, 06:36:31 AM
There are no arm CPU today that could touch even a Core2Duo CPU running single core, maybe in the future. The GPU in Pi is way more powerfull than the IntelGMA GPU in my laptop. Let's see if there ever will be a native drivers for Pi GPU running AROS. Problem might be with next Pi new GPU would need new drivers.

The best consumer ARM cores (like the Apple A12) are well into laptop CPU territory. The A12 is about on a par for single core performance with the last couple of generations of low voltage i3 and i5 and probably Haswell generation desktop i3 parts. Some of the server kit is even quicker. The picture is complicated a bit on Android/Linux because of Linus's resistance to non-x86 CPUs. But ARM designs have advanced hugely performance wise since around the time of the A15, and whatever they are like as a company, Apple's CPUs are just phenomenal.

As to the GPU getting a native driver. I wouldn't rule it out, the Pi foundation have been awesome at persuading Broadcom to publish open source documentation and dev resources for the Video Core IV in the first three generations, but I'm not sure that the same has been done for the latest model.