Connect Django Haystack to Solr Cloud

At BV FAPESP (www.bv.fapesp.br) we use Solr as the searchengine backend, and a library called Haystack to tie Solr to Django.

In 2018, me and my team wrote a Python/Django library to use with Apache Solr in cloud mode. We were avoiding the use of Django/Haystack library, since there were some features not supported, like grouping, Streaming Expressions, Graph Analysis.

So far so good, before the end of the project I had in production environment Solr Cloud running smoothly, but I still had a single Solr running with Haystack, because we didn't re-code the whole system, and there still exist a legacy using Haystack.

To turn-off the single Solr, we moved all documents to Solr Cloud and connected Haystack to it. This is what I documented here, for myself and maybe you, trying to make the same.

Step-by-step

There is Solr Cloud python backend for Haystack, that you can find here:
https://github.com/django-haystack/django-haystack/pull/1580/commits/13df4a9e69ececd5567636085df4e353ce540a35

Copy this file to your Haystack env/virtualenv folder, like this:
/lib/python2.7/site-packages/haystack/backends/solrcloud_backend.py

Use this pysolr.py
https://github.com/django-haystack/pysolr/blob/master/pysolr.py


Zookeeper / Kazoo

You must use Zookeeper in production environment
pip install kazoo

Configuring

For the configurations bellow, follow this comments here:

Put those lines on your settings/local_settings.py with your infrastructure settings.
https://github.com/django-haystack/django-haystack/pull/1580#issuecomment-399378902

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'solrcloud_backend.SolrCloudEngine',
        'URL': '127.0.0.1:9983',  # this is the ZooKeeper
        'COLLECTION': 'gettingstarted',  # example SolrCloud collection
    },
}
Configure this settings in pysolr.py to:

[...]

ZooKeeper.CLUSTER_STATE = '/collections/{}/state.json'.format(connection_options['COLLECTION'])
zookeeper = ZooKeeper(connection_options['URL'])
[...]

wsgi.py

If you use uwsgi, just take care when you change your environmet / virtualenv, to set PATH and PYTHONPATH correctly.

This is for me to remember, when creating a new virtualenv and copy wsgi.py already configured.

Popular posts from this blog

Atom - Jupyter / Hydrogen

Metodologias em ação

Design Patterns - Observer