chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] CHICKEN in production


From: r
Subject: Re: [Chicken-hackers] CHICKEN in production
Date: Wed, 08 Oct 2014 04:30:51 +0400
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0


08.10.2014 1:31, Oleg Kolosov пишет:
On Oct 7, 2014, at 10:04 PM, Peter Bex <address@hidden> wrote:
On Tue, Oct 07, 2014 at 01:13:09AM +0400, Oleg Kolosov wrote:

Hello Oleg,

Thanks for providing some more information about your project!
I think these kinds of postmortem analyses are very interesting
and we should take to heart all the lessons learned, and use them
to improve CHICKEN.

We are trying to avoid using Chicken as a ‘glue’ because we figured that FFI transitions can be major bottleneck (especially strings).
The overhead of calling C should be pretty minimal in the usual cases,
unless strings are the only problem.  If it's the only dealbreaker,
I think that should be fixable.
Yes, FFI overhead is within 5% of the pure C program for simple cases: passing around immediate values and pointers. But we have very important use case - fuzzy search in the song info database: tens of thousands of records. We use custom highly tuned indexing algorithm. The initial implementation was written in Scheme, was small and beautiful (according to author) but unusably slow. We tried to tune it, but measured that cost of passing strings through FFI is still too big and unavoidable due to copying. Additionally, there was some performance problems with unicode handling. So, now we use libc locale functions for conversions and doing indexing and processing in C. This is pain but at least 3 times faster than Chicken. There are still a lot of trickery on the GUI side to provide responsive incremental search because the amount of data returned is still quite large. 

The main problem is that pure Scheme implementation double memory usage (half of memory reserved for coping GC), a full search index tree contains about 2M nodes (approx 40*2MB). Also library should run on ios/android/win (offline catalog on user device or PC) packaging libchicken with eggs very tricky.
Performance tests show that reference implementation 2-3 times slower than C.

Passing strings through FFI is pretty slow, for example c-string type was used to represent color value, almost every drawing function require color argument, at some point we noticed huge performance degradation, replacing c-string with scheme-pointer + ##sys#size help a lot, cause its require data length as bonus we get a validation of malformed utf8 strings.
  

        
And adding Chicken to a C program makes normal analysis and debugging tools pretty much useless (for finding memory leaks and such), so hardware interfacing layer is pure C with separate high level FFI bindings on top.
It takes some more practice, but debugging C code called from CHICKEN is
quite doable in my experience, but then I've never done huge C & CHICKEN
projects, only smaller libraries.  Could you explain a bit more what the
problems are you ran into?
Yes, I’ve done some debugging of generated code for Windows port. It is possible in principle, but requires some familiarity with the implementation and used as a last resort (mysterious crashes and such). In reality call stacks are almost infinite - it is hard to pinpoint interesting parts within the wall of f1234 functions. And useful info about passed arguments and such is left in the generated comments - you need to inspect the sources with the ‘list’ command to view it. We tried to improve this with the insertion of #line directives without much success - code generator is too complex, especially where FFI is involved. We are inserting logging statements everywhere. Unfortunately logging considerably uglifies the code and makes some functional programming idioms much harder to use (like map/fold/cut oneliners). Also various analysis tools like Valgrind and libc malloc checkers fall flat when Chicken is involved.

On MSVC (2008) adding #line's trigger compilation error when directive placed between macro parameters, c-backend full of C_xxx macro calls so im failed to fix this.
Compiled with GCC 4.6 sample program nicely step line by line *.scm file, but some times its go crazy and jump on random empty lines xD such source lines didnt have any #line reference in *.c file. Watching locals or someting around even with renamed tXXX identifiers very hard cause fXXX continuations break scope every time.


      

        
We also struggled with posix and process control functions a lot (long story), trying to be functional here backfires badly, so we ended up with straightforward and ugly code (looking like verbose C with parentheses), replacing some functions from standard library (namely process-run) and customized error handling.
Would you care to unpack this a little?
We are trying to simulate parallel processing and separate responsibilities with the worker processes communicating through sockets. There are also message passing threads involved for monitoring and control. Judging by the history this may be the most buggy part of the project. With numerous workarounds and special case handling. SIGINT handling is still buggy, but not critical for production. Yes, the task is complex, but the API is too confusing and fragile too. It might be adequate for C but in Scheme a lot of foots was shoot away.

Posix process management and interrupts ancient pure evil.


      

        
There was a few problems (I don’t remember clearly) with preemptive scheduling, so we are using strategically placed carefully adjusted sleeps with manual yields. I’ve borrowed a few ideas from Chicken implementation and made a video player (used for background: pure C, no FFI, no GUI) abusing libuv event loop for CPS trampoline. The code looks strange for casual observer but performs surprisingly well. I’ve not yet figured out how to wrap this for an egg (managing C callbacks is hard).
Sounds interesting.  So at least you got something out of it aside from
just frustration ;)
There was some discussions about replacing Chicken scheduler with libuv event loop and providing filesystem and socket API on top of it. The scheduler modification is necessary to block green threads to simulate synchronous calls. There are a lot of custom and confusing code in Chicken around select function with workarounds for Windows. We think that libuv implementation is superior. There are some concept code but we’ve not progressed too far with this yet.


        
So, in the end, there are some great things (see video in this thread) to showcase, but for me (low-level and performance stuff mostly) it was more pain than joy.
If you can pinpoint the exact places where performance is particularly
bad we can (at least attempt to) fix them.
Passing large number of C strings through FFI back and forth, utf-8 (we tested on uppercase conversion and trimming AFAIR). Update with defstruct is horribly slow - I don’t know all the details, just heard the conversation.

Scheduler even with disable-interrupts is still active - very hard to diagnose, but mysterious bugs are fixed by going down to C and not returning back until everything is settled (like fork -> exec). It would be nice to have an option to get rid of it, i.e. for performance critical parts we would like to have complete manual control - without interrupt handling and such code inserted.

Maybe with upcoming CHICKEN5 changes core become more hacking friendly!


      

        
There are hot internal discussions currently about migrating to something more widely supported (with proper debugger, profiler, and other useful tools) for our next big project, because a new hardware is more powerful and there are fewer restrictions.
This is a bit of a tidal function: sure, hardware gets faster every year,
but then they invent some new class of device which is more constrained
than the previous generation, or a new niche of computation evolves where
every CPU cycle is precious (bitcoin mining? 3D games?).  So even though
there are lots of people falling over eachother trying to tell you that
"hardware doesn't matter" and you should use their slow-ass language,
that's just bullshit: performance will *always* matter.
This is true. But our new platform is even more customized for the given use cases and contains various specialized hardware to assist the CPU (like DSPs and ADC/DAC’s). It is still early prototype, but we are discussing how many cycles we are ready to burn for supposedly faster and straightforward development process.

— 
Regards, Oleg
Art-System




_______________________________________________
Chicken-hackers mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-hackers


reply via email to

[Prev in Thread] Current Thread [Next in Thread]