Some hopefully useful tips on speeding up data import via SOLR DataImportHandler.
Latest version of log-malloc2 library has (IMHO) an unique little feature, that makes it well suited for unit testing memory allocations. It provides simple API for inquiring actual memory usage at runtime. This way, it is possible to compare usage before entering and after leaving some function, to ensure that there are no memory leaks inside of it.
New version of log-malloc2 provides new helpful functions and scripts that make backtrace printing and analyzing easy and convenient.
log-malloc2_util.h provides few fully inlined functions:
- Pre-initializes backtrace() function, to avoid any later memory allocations. Use of this function is optional, but it’s good to use it on program start if you want to generate backtrace in SIGSEGV signal handler (memory allocations in SIGSEGV should be avoided if possible).
2. ssize_t log_malloc_backtrace(int fd)
- Prints current backtrace to given file descriptor, including process memory map (/proc/self/maps) to make backtrace symbol conversion easier (this is needed because of ASLR).
- Generated output can be directly pasted to backtrace2line script, that will convert it to human readable stack trace (ASLR is supported).
Because both functions are inlined, it is not needed to link program against log-malloc2 library, and this makes it also bit easier to use it in segfault (SIGSEGV) signal handler.
I’ve actually released log-malloc2 library for linux, that logs calls to memory allocation functions and should be very helpful when trying to locate memory leaks. It can be used without recompiling application, simply by preloading it using LD_PRELOAD .
Every function call is logged with their parameters, amount of allocated/deallocated memory, total amount of allocated memory, copy of /proc/self/statm content and backtrace (call stack) if possible. Additionally function call counts is logged and printed out when application exits.
LD_PRELOAD=../liblog-malloc2.so PROGRAM ARGUMENTS 1022>/tmp/malloc.log
More complete description of library, its usage, logging format and internals can be found in README file.
Library can be downloaded from project homepage. Actually there is no script helping with malloc log file analysis included, but logfile format is very simple and there should be no big problem write your own doing exactly what you want.
You should note that using this library harms application performance, is intensively uses allocate/deallocate functions. Consider logging to file located on tmpfs (in memory) filesystem, to improve logging IO throughput 😉
Released solr_pager 0.2.2. This release fixes little bug that prevented solr_pager to work in some configurations (ie. with standard search handler). Well, Solr documentation did not mentioned that it can pass empty (null) result set to my component.
Also added some sanity checks to prevent similar situation again.
New version can be found on http://devel.dob.sk/solr_pager, enjoy.