Having the ability to catalog and query all files stored on a server is essential. For instance, when dealing with large collections of digital documents like PDFs, a searchable index can prove invaluable for quickly locating specific content.
What is Sist2?
Sist2 is a high-performance file indexing and searching solution, built using C and VueJS technologies, that leverages Elastisearch for fast and efficient content retrieval.
Over time, I've accumulated a substantial collection of PDFs, including a large cache of repair manuals from iFixit, which was shared publicly on Reddit several years ago. However, I never found an effective way to organize and search through this vast repository of documents. This was a challenge posed by the sheer size of approximately 4,000 individual PDF files.

When I set up Sist2 using Docker, it proved remarkably efficient, quickly indexing all PDF repair manuals, even with Optical Character Recognition (OCR), allowing me to search both the filenames and the contents of the documents.
Sist2 Core Features
- Scanning & Scheduling: Manage scan jobs via simple web interface.
- Multi-Platform Support: Fast, low memory usage, multi-threaded scanning
- File Analysis: Extracts text, metadata, generates thumbnails from various file types
- Incremental Scanning: Scan files only when changed
- Tagging & Scripting: Manual and automatic tagging via UI or user scripts
- Archive Support: Recursive scan inside archive files
- OCR Integration: Uses Tesseract for optical character recognition
- Visualization: Stats page with disk utilization visualization
- Named-Entity Recognition: Client-side named-entity recognition (NER)
Install Sist2 using Docker Compose
The installation process is simple using Docker Compose. We will need to make a few adjustments to the volumes we want to index for searching.
services:
elasticsearch:
image: elasticsearch:7.17.9
restart: unless-stopped
volumes:
# This directory must have 1000:1000 permissions (or update PUID & PGID below)
- /docker/sist2/sist2-es-data:/usr/share/elasticsearch/data
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- "PUID=1000"
- "PGID=1000"
sist2-admin:
image: sist2app/sist2:x64-linux
restart: unless-stopped
volumes:
- /docker/sist2/sist2-admin-data:/sist2-admin
- /Manuals/iFixit:/ifixit
ports:
- 4090:4090
# NOTE: Don't expose this port publicly!
- 8080:8080
working_dir: /root/sist2-admin/
entrypoint: python3
command:
- /root/sist2-admin/sist2_admin/app.pyBe sure to mount directories you want to index as volumes inside the container. You can give the volume inside the container any directory name. In this case, for me it will be /ifixit so within the app we will refer to that directory in the setup process.
Port 4090 is the front end web interface and port 8080 is the backend. I advise against exposing the backend. Once the installation is complete, navigate to the backend on port 8080 and create your first Job.

Type the name of the Job and click create. It will then be listed below where you can click on it to adjust settings, tasks and more.

The most important options are the "Search backend" and "Path". You can see I put /ifixit as the Path because that is where the data is located inside the container.
If you'd like to make additional customizations, feel free to scroll down and explore other options. However, I've maintained the default settings for simplicity. For instance, selecting the "Only index file names & mime type" option will bypass OCR and thumbnail generation, significantly reducing indexing time.
Once you have the options you want applied, scroll back to the top and click the blue "Index now" button.

This will queue the task for the Job to be indexed. You can click "Tasks" in the upper right corner to view the progress.

When the scan is complete, navigate to the Frontend tab and click the default setting.

Here I choose Elastisearch as the backend then choose the Job we just created which is ifixit in my case.

Additionally, you can opt-in to basic web authentication and integrate Auth0 if desired, but I've kept all other settings to their default values. To proceed, simply click the green "Start" button at the top of the page, which will launch the frontend application. Once launched, visit the server's IP address on port 4090 to access your search index.

While the frontend may be straightforward in design, its user-friendly search interface delivers rapid and accurate results through the use of advanced fuzzy matching techniques. Additionally, it offers flexible configuration options, enabling users to tailor their search experience to meet their specific requirements.
Final Notes and Thoughts
Sist2 is fantastic! I’m so impressed with how well it works and how easy it was to get started. Honestly, I’ve struggled with so many other apps that were either overly complicated or just didn’t have the support I needed. This is exactly what I'd been looking for to index and search files on my server.
If you enjoyed this article and find Sist2 useful, be sure to give the project a star on the Sist2 Github repo!




Discussion