Live profiling RTS progress

Posted on June 22, 2016

The heavy work under my HSOC project has begun. The first and the most important point in my plan is implementation of RTS API for following actions with the eventlog subsystem:

  • Redirecting event stream to user specified file descriptor.

  • Piping all events via memory into the Haskell side.

  • Dynamic resizing of internal buffers.

The patched GHC repo is located here and the current Haskell side repo is here.

/*
 * Set custom file stream for global event log sink.
 *
 * The function overwrites previous event log file pointer. Previous 
 * sink is closed only if closePrev flag is on.
 *
 * Writing to the sink is protected by global mutex.
 *
 * The function puts header to the new sink only when emitHeader flag
 * is on. User might not want the header if it is switching to 
 * already existed eventlog handle that was switched away recently.
 */
void rts_setEventLogSink(FILE *sink,
                         StgBool closePrev,
                         StgBool emitHeader);

/*
 * Get current file stream that is used for global event log sink.
 *
 * You shouldn't do anything with the pointer until 
 * rts_setEventLogSink(otherFileOrNull, false) is called. After that 
 * you can do anything with the file stream.
 */
FILE* rts_getEventLogSink(void);

/*
 * If RTS started with '-lm' flag then eventlog is stored in memory buffer.
 * 
 * The function allows to pop chunks of the buffer. Return value of 0 means
 * that there is no any filled chunk of data.
 *
 * If the function returns nonzero value the parameter contains full chunk 
 * of eventlog data with size of the returned value. Caller must free the
 * buffer, the buffer isn't referenced anywhere anymore.
 *
 * If nobody calls the function with '-lm' flag then the memory is kinda
 * to be exhausted.
 */
StgWord64 rts_getEventLogChunk(StgInt8 **ptr);

/*
 * Reallocate inner buffers to match the new size. The size should be not
 * too small to contain at least one event.
 *
 * If RTS started with '-lm' the chunks of memory buffer is also resized.
 */
void rts_resizeEventLog(StgWord64 size);

/*
 * Return current size of eventlog buffers.
 */
StgWord64 rts_getEventLogBuffersSize(void);
-- | The 'setEventLogHandle' function changes current sink of the eventlog, if eventlog
-- profiling is available and enabled at runtime.
--
-- The second parameter defines whether old sink should be finalized and closed or not. 
-- Preserving it could be helpful for temporal redirection of eventlog data into not 
-- standard sink and then restoring to the default file sink.
--
-- The third parameter defines whether new header section should be emitted to the new
-- sink. Emitting header to already started eventlog streams will corrupt the structure 
-- of eventlog format.

-- @since 4.10.0.0
setEventLogHandle :: Handle -> Bool -> Bool -> IO ()

-- | The 'setEventLogCFile' function changes current sink of the eventlog, if eventlog
-- profiling is available and enabled at runtime.
--
-- The second parameter defines whether old sink should be finalized and closed or not. 
-- Preserving it could be helpful for temporal redirection of eventlog data into not 
-- standard sink and then restoring to the default file sink.
--
-- The third parameter defines whether new header section should be emitted to the new
-- sink. Emitting header to already started eventlog streams will corrupt the structure 
-- of eventlog format.
--
-- The function is more low-level than 'setEventLogHandle' but doesn't recreate underlying
-- file descriptor and is intended to use with 'getEventLogCFile' to save and restore 
-- current sink of the eventlog.
--
-- @since 4.10.0.0
setEventLogCFile :: Ptr CFile -> Bool -> Bool -> IO ()

-- | The 'getEventLogCFile' function returns current sink of the eventlog, if eventlog
-- profiling is available and enabled at runtime.
--
-- The function is intented to be used with 'setEventLogCFile' to save and restore 
-- current sink of the eventlog.
--
-- @since 4.10.0.0
getEventLogCFile :: IO (Ptr CFile)

-- | Setting size of internal eventlog buffers. The size should be not
-- too small to contain at least one event.
--
-- If RTS started with '-lm' the chunks of memory buffer is also resized.
--
-- The larger the buffers the lesser overhead from event logging, but 
-- larger delays between data dumps.
--
-- See also: 'getEventLogChunk', 'getEventLogBufferSize'
setEventLogBufferSize :: Word -> IO ()

-- | Getting size of internal eventlog buffers.
--
-- See also: 'setEventLogBufferSize', 'getEventLogChunk'
getEventLogBufferSize :: IO Word

-- | Get next portion of the eventlog data.
--
-- If RTS started with '-lm' flag then eventlog is stored in memory buffer.
-- 
-- The function allows to pop chunks out of the buffer. Return value of Nothing 
-- means that there is no any filled chunk of data.
--
-- If the function returns nonzero value the parameter contains full chunk 
-- of eventlog data with size of the returned value. Caller must free the
-- buffer with 'free' from 'Foreign.Marshal.Alloc', the buffer isn't referenced 
-- anywhere anymore.
--
-- If nobody calls the function with '-lm' flag on then the memory is kinda
-- to be exhausted.
--
-- If '-lm' flag is off, the function returns always 'Nothing'.
--
-- See also: 'setEventLogBufferSize'
getEventLogChunk :: IO (Maybe CStringLen)

Design choices

To get data from RTS side about occured events one can imagine three different approaches.

Substitution of file descriptor

The idea is simple, user specifies own file descriptor as sink for the eventlog. The most easiest solution to implement, but has some caveats.

Firstly, you will ultimately skip initial events that carry very important info about capability sets, program arguments and spawned threads. The events are spawned before the main thread starts, so you cannot redirect the events to custom sink. The caveat could be workarounded by force flushing the default eventlog file and reading the initial events from it.

Secondly, one should use memory pipes represented as a file descriptor (FILE* actually), that is quite platform dependent. POSIX systems can use mkfifo(3) and Windows has its own NamesPipes. And finally, I quickly hit an issue when RTS blocks on writing to the pipe (created with mkfifo) as reader thread in Haskell side was slower than writer on the RTS side.

Despite the problems I encountered, the file descriptor substitution API that I implemented is still useful for temporal eventlog disabling (setting NULL as descriptor) and for simple tasks like redirecting the events to non-standard file.

Making arbitrary hook on data dumping

It is the most flexible solution one can imagine as user can do anything with the data passed into the hook. But after some brainstorming of call sequence I realized that the approach is likely to cause deadlocks.

Let’s say we made a Haskell hook that writes to some memory buffer or socket. During the action a GC or other events might occur that can overflow an internal buffer. Global eventlog sink is protected by a mutex that is kept when we enters the hook callback, and the hook will provoke another eventlog dump that tries to acquire lock of the mutex again. Despite the fact of double locking of the mutex there would be possible infinite recursive call of the hook as each callback call can provoke the eventlog dumping.

I was playing around with the hook implementations and found another source of problems. If the hook contains internal foreign calls the RTS will crash with reentered to scheduler error. The problem can be easyly hit by using POSIX memory pipes as they require some foreign calls.

The only safe way of using such hook is use it only from C code or any other language that cannot produce new eventlog messages. I decided to remove the hooking API from RTS when superseding design was implemented. It was simply too dangerous to use.

Storing in memory

The most complicated solution when RTS has its own extendible memory buffer to store eventlog data until Haskell side not consume it. The tricky part is maintaining of memory usage, the method I designed is shown on the picture below. RTS keeps a linked list of memory blobs where new events are pushed to tail and the Haskell side pops out the head of buffer.

The solution is portable and has advantage in reusing of memory of popped chunks. I create a bytestring from the buffer without copying from the popped chunked. So I have sticked to the solution as good enough.

The buffer is created and used only when special RTS flag -lm is set to prevent occasional memory exhaustion when nobody pops the log head.

The story about reallocation

I have run into numerous segfaults on my way to operational design with dynamically sized buffers. The main reason of segfaults was that not all eventlog operations are protected by mutexes. As each capability has its own buffer, there is no need for locking that is very expensive. The diagram of data flow is presented below.

I successfully designed a solution when no additional locks are needed. The Haskell side posts only a request for resize and actual resizing is performed by the capability that owns a buffer being resized. Now user can control granularity of live logging as smaller buffers will flush frequently.

Haskell side

Currently Haskell side library for live monitor simply prints results of incremental parsing of the eventlog piped from chunked buffer. Thanks a lot to Karl Velicka for amazing PR with incremental parser for ghc-events!

My next goal is entention of the library to support TCP/UDP protocol for communication with future live profiler client. There is important constituent part of eventlog state management that have to support connecting and disconnecting clients at arbitrary time.