Posts

Showing posts from September, 2019

Django middleware - Crawler detection

Sometimes you have to detect search engine crawling activity on your system, to handle the workflow for this type of request. On our website, there is a high crawling activity, because BV FAPESP (https://bv.fapesp.br) provides useful information abour reaserch for science, technology and academy in Sao Paulo state of Brazil. Recently, I deployed a feature on our system to allow people to store their queries and filters they do on our website. This is an open feature for everybody who navigates on our system. It is based on the HTTP session of the brower, which is stored on Python/Django server-side. When we were tunning Django session table in the database, we found out that there were a lot of sessions created by crawling activities. To take control of the session creation, I wrote the middleware that follows.  The list of crawlers, I get from ngix logs, on a very small timeframe, so maybe there are some more search engines that are not listed. The use-case presented, is j

Connect Django Haystack to Solr Cloud

At BV FAPESP (www.bv.fapesp.br) we use Solr as the searchengine backend, and a library called Haystack to tie Solr to Django. In 2018, me and my team wrote a Python/Django library to use with Apache Solr in cloud mode. We were avoiding the use of Django/Haystack library, since there were some features not supported, like grouping, Streaming Expressions, Graph Analysis. So far so good, before the end of the project I had in production environment Solr Cloud running smoothly, but I still had a single Solr running with Haystack, because we didn't re-code the whole system, and there still exist a legacy using Haystack. To turn-off the single Solr, we moved all documents to Solr Cloud and connected Haystack to it. This is what I documented here, for myself and maybe you, trying to make the same. Step-by-step There is Solr Cloud python backend for Haystack, that you can find here: https://github.com/django-haystack/django-haystack/pull/1580/commits/13df4a9e69ececd5567636085df4