Running DocsGPT with Dragonfly: Adding AI to Your Documentation

Introduction

It's a fact that developers spend an enormous amount of time in documentation, and we've found that Dragonfly community members are no different. However, we often get questions from community members that are answered in some detail in the documentation. And according to colleagues in tech and my own experience, this issue is not unique to Dragonfly. Whether it's because of a complicated information structure or simply a TL;DR situation, developers often have trouble finding what they need quickly in documentation.

Community Member:
How many backups does '--snapshot_cron' store?
Do I need to manually remove them?

Answer:
By default Dragonfly does not automatically remove old snapshots.
You can choose to override the same file by setting the '--dbfilename=dump' without timestamps.

Now, we love engaging with our Dragonfly community as much as any other tech team, but surely there's a way to make docs easier to use. We propose that AI, with the power of large language models (LLMs), could be a great solution. While AI still has a ways to go before it can fulfill some of the loftier goals talked about on a daily basis, gathering specific information from a vat of data is exactly in the wheelhouse of LLMs as they exist today.

So when we came across DocsGPT, we were excited about the possibilities! DocsGPT is a cutting-edge open-source solution designed to streamline the process of finding information within the documentation of a project. By leveraging the power of GPT models, DocsGPT allows developers to easily ask questions about the documentation and receive accurate, helpful answers. DocsGPT typically uses MongoDB and Celery as its backend storage and task queue. Celery, in turn, can use Dragonfly as the message broker and for result storage.

In other words, we thought we might be able to use Dragonfly to use DocsGPT to make Dragonfly's documentation easier to use. And with that nerdy inception, we set out to experiment and find out the best way to do this. If it worked, this process could be replicated seamlessly for other products' documentation as well, and we could write a post explaining how to do it.

Spoiler: It worked!

Setting up DocsGPT with Dragonfly

When we often say you need no code changes to use Dragonfly, that's indeed true. The setup is straightforward and requires only minimal modifications in terms of configuration. Here's how you can get started by following the simple steps below.

1. Clone the DocsGPT Project

Begin by cloning the DocsGPT repository from GitHub to your local machine. Open your terminal and run:

git git@github.com:arc53/DocsGPT.git
cd DocsGPT

2. Modify the `docker-compose.yaml` File

Next, you need to update the docker-compose.yaml file to replace the Redis image with the Dragonfly image. Open the file in your preferred text editor and find the Redis service section. Replace the Redis image line with the following:

redis:
  image: docker.dragonflydb.io/dragonflydb/dragonfly # Replace with Dragonfly image.
  ports:
    - 6379:6379

3. Run the Setup Script

Once you've made the necessary modifications, run the setup script (which works for Linux and macOS) to initialize the DocsGPT environment. If you're using Windows, follow the instructions here. For simplicity, we can choose the DocsGPT public API, which is simple and free, during the setup process. In your terminal, execute:

./setup.sh

# Do you want to:
# 1. Use DocsGPT public API (simple and free)
# 2. Download the language model locally (12GB)
# 3. Use the OpenAI API (requires an API key)
# Enter your choice (1, 2 or 3): 1
# ...

4. Start the Services

After running the setup script, Docker will pull the required images and start the services. This might take a few moments. Once completed, you should have a fully functional DocsGPT instance running locally. That's it! We changed nothing in the codebase except the Docker image URL, and DocsGPT is now using Dragonfly as the message broker and result storage for Celery.

Training DocsGPT with Dragonfly Integrations Docs

Let's try training DocsGPT with the Dragonfly documentation. Specifically, we will upload the Dragonfly Integrations documentation (which is open-sourced here) to DocsGPT and ask questions about it. Firstly, navigate to the sidebar and find the Source Docs options.

Click on the upload icon next to the Source Docs options. Browse and upload the markdown files containing the Dragonfly Integrations documentation.

Click on the Train button to begin the training process. This might take some time, depending on the size of the documentation. We can monitor the training progress, and once it's completed, click the Finish button. Now, the Dragonfly Integrations documentation is ready to use.

Go to New Chat and, from the sidebar, select the documentation we just uploaded. By asking a question about BullMQ, for instance, DocsGPT will provide effective answers based on what it has learned from the Dragonfly documentation!

All the steps above are demonstrated in the 60s video below:

Training DocsGPT with All Dragonfly Documentation

Based on my experiment, DocsGPT does support uploading multiple documents for training. However, if these documents are scattered across different folders, it's not possible to upload all of them using the DocsGPT web interface. The good news is that DocsGPT provides neat and easy-to-follow APIs, allowing us to train multiple documents programmatically. Let's start with the following simple Go snippet, which traverses all markdown files within the docsDir directory and its subfolders.

import (
	"log"
	"os"
	"path/filepath"
	"strings"
)

// Gather all markdown files within the docs directory and its subfolders.
func gatherMarkdownFiles(docsDir string) ([]string, error) {
	var files []string

	err := filepath.Walk(docsDir, func(path string, info os.FileInfo, err error) error {
		if !info.IsDir() && strings.HasSuffix(info.Name(), ".md") {
			files = append(files, path)
		}
		return nil
	})
	if err != nil {
		log.Printf("error walking the directory %v: %v\n", docsDir, err)
		return nil, err
	}

	return files, nil
}

Once we have all the files, we can create a multipart form request to upload them for training. Note that all files need to be sent within the same request, as we can add multiple forms within the request. Below is a Go snippet demonstrating how to create such a request.

import (
	"bytes"
	"io"
	"mime/multipart"
	"net/http"
	"os"
	"path/filepath"
)

const docsUploadURL = "http://localhost:7091/api/upload"

// Create a multipart form request to upload the markdown files.
func createRequest(files []string) (*http.Request, error) {
	var (
		reqBody bytes.Buffer
		writer  = multipart.NewWriter(&reqBody)
	)

	// Add form fields.
	if err := writer.WriteField("name", "Dragonfly-All"); err != nil {
		return nil, err
	}
	if err := writer.WriteField("user", "local"); err != nil {
		return nil, err
	}

	// Add files to the request.
	for _, file := range files {
		if err := readFileToWriter(file, writer); err != nil {
			return nil, err
		}
	}

	if err := writer.Close(); err != nil {
		return nil, err
	}

	req, err := http.NewRequest("POST", docsUploadURL, &reqBody)
	if err != nil {
		return nil, err
	}
	req.Header.Set("Content-Type", writer.FormDataContentType())
	return req, nil
}

// Add a single file to the multipart form.
func readFileToWriter(file string, writer *multipart.Writer) error {
	f, err := os.Open(file)
	if err != nil {
		return err
	}
	defer f.Close()

	part, err := writer.CreateFormFile("file", filepath.Base(file))
	if err != nil {
		return err
	}

	if _, err := io.Copy(part, f); err != nil {
		return err
	}

	return nil
}

The request can then be sent to DocsGPT for training. Since we are uploading all markdown files from the Dragonfly documentation, the training process may take a slightly longer time. Once the training is complete, you will have a DocsGPT instance enriched with the comprehensive knowledge from the Dragonfly documentation, ready to be queried for informative answers.

Conclusion

As it turns out, not only is it possible to make docs easier, but integrating DocsGPT with Dragonfly (using Dragonfly) has proven to be pretty easy. In many cases, by replacing Redis with Dragonfly, we can leverage the high performance and compatibility of this modern multi-threaded in-memory data store without requiring any code changes. We definitely recommend trying this for your own or your most-used documentation. It reduces the time to find an answer to seconds instead of slowly sifting through many, many docs pages to get the details you need.

Let us know (and tag us) if you try this out! We would love to hear your feedback and any of your own modifications that make docs easier to use for you. Find us on Discord, Discourse, or GitHub.

And in case you have somehow not been to our docs yet, check them out here! We always want to know how we can make them better and easier to use for you.