SOLR is great search server, and in some cases I prefer it over Elasticsearch, because it is possible to execute powerful queries very easily via simple GET requests. This way, you don’t need some front-facing app, that translates input requests to search queries.
But, there is a catch. Although it is possible to expose SOLR API directly to public – this is not very safe option, because it might happen, that you also inadvertently expose either:
- whole SOLR API, including endpoints for updating index, adding or removing cores and other internal functionalities. This occurs i.e. if you expose SOLR port directly on the internet.
- SOLR search handler API, if you put SOLR behind some restrictive reverse proxy (like nginx). This is better option, but you still expose whole search API, with all its functionalities, so anyone can now craft their own queries, use computing expensive queries or reveal internal data from index that should not have been public (!).
In both cases, you also increase possibility of DoS attacks and exploits, because you have a broader attack surface. Also your frontend (web) app will depend on SOLR api, so you have to fiddle with frontend code just to change some search options.
Because of this, it is IMHO always needed to have an application, that translates public requests to internal SOLR queries – OR – you can use Nginx reverse proxy with bit of Lua scripting for this ! Without the need for another running app, without complicated routing code, without additional latency, complicated deployment and who knows what else
Basic example using solr_arg.lua
location /search {
# build $args
set_by_lua_block $args {
return require("solr_args")
.reset()
.wildcard_query(ngx.var.arg_q)
.start(ngx.var.arg_start)
.output_json(ngx.var.arg_o == 'json')
.filter('arg_fy', '{!tag=YEAR}year', tonumber(ngx.var.arg_fy))
.filter('arg_fm', '{!tag=MONTH}month', tonumber(ngx.var.arg_fm))
.build();
}
# rewrite path
rewrite ^ /solr/core1/select break;
# proxy request
proxy_pass http://solr6_server;
}
This configuration will proxy web request to /search with this possible GET parameters
- q= — for query
- start= — for paging
- o= — to change output type
- fy= — filter search on year field, original filter parameter will be returned as arg_fy in result
- fm= — filter search on month field, original filter parameter will be returned as arg_fy in result
Requests will be passed (proxied) by Nginx proxy to solr_server (it’s a nginx upstream definition), to SOLR core named ‘core1‘ and to SOLR search handler named ‘search1‘ and Nginx will return SOLR response to client.
It is also possible to further process the response within Nginx, because I think its a bit too chatty and big. You can for example reformat it within Nginx with XSLT module. Just change SOLR output type to XML, write you custome XSL template and add i.e.
xslt_stylesheet solr2myformat.xsl;
to location handler. Although XSLT might seem to be a ‘bit’ complicated language if you never seen it before, it is very powerful and very performant, so it fits perfectly here (and I like that technology ;-). I’m using it to i.e. produce custom JSON responses for DataTables or to prepare complete HTML chunks for inclusion in WordPress pages, etc..
This whole setup is very easy to maintain and develop. Once you have the complicated part done (like setting up and configuring SOLR), and you know your queries, writing the translating API is really easy. And if you care about performance, I’m using something similar on a site handling roughly 900k request a month and the server CPU usage is usually under 10% while serving responses (deeply) under 100ms.
You can find concerned Lua packages here: https://github.com/samsk/solr-utils/tree/master/lua. The documentation is scarce, but the code is quite simple, and the package functions can be IMHO understood easily.
If you need my help with your project, that requires some high-performance search engine, don’t hesitate to contact me.