Crypto News Aggregation with Dragonfly

Disclaimer: This post focuses on the technical implementation of a high-performance crypto news aggregation system using Dragonfly, applicable to other news aggregation use cases as well. It does not provide investment advice on cryptocurrencies or any other assets.

Exploring Advanced Capabilities with Vector Search

In this series of guides, we’re building a crypto news aggregation system using Dragonfly and Python. In part one of this two-part series, we focused on fundamental functionalities such as ingesting news data, creating efficient indexes, and enabling advanced aggregation by key parameters like frequency or sentiment.

Now, in part two, we’ll shift our focus to vector search—a powerful technique for identifying similar articles based on their embeddings. Building on the foundation we established earlier, we’ll enhance our system by implementing a sophisticated semantic search capability. This will allow us to explore deeper insights and connections within the news data. Let’s dive into the next steps to make our system even more powerful and insightful.

Adding Embeddings to Each Article

First, we’ll add embeddings to each article using Dragonfly’s JSON functionality. We can simply set a new field in the JSON without reading and rewriting the entire object:

def update_article_embedding(client, key, article_content):
    embedding = str(generate_embedding(article_content).tolist())
    client.execute_command("JSON.SET", key, "$.embedding", embedding)

Below is one of the simplest implementations of the generate_embedding function:

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("sentence-transformers/paraphrase-MiniLM-L3-v2")

def generate_embedding(text):
    return model.encode(text, convert_to_numpy=True).astype(np.float32)

Creating an Index for Vector Search

Next, we’ll create an index that supports vector search on these embeddings as below. For detailed information about each parameter of the FT.CREATE command, you can check our documentation here.

client.execute_command(
    "FT.CREATE", "news_vector_idx", "ON", "JSON",
    "PREFIX", 1, "article:",
    "SCHEMA",
    "$.id", "AS", "id", "NUMERIC",
    "$.published_at", "AS", "published_at", "NUMERIC",
    "SORTABLE", "$.title", "AS", "title", "TEXT",
    "$.embedding", "AS", "embedding", "VECTOR", "FLAT", 6, "TYPE", "FLOAT32",
    "DIM", 384, "DISTANCE_METRIC", "COSINE",
)

Running a KNN Search

Rather than matching exact keywords, we search for articles with embeddings closest to the query’s embedding. This returns articles that are semantically similar, even if they don’t contain the exact same words.

def get_similar_articles(client, query_text, from_timestamp, top=10):
    query_embedding = generate_embedding(query_text).tobytes()
    return client.execute_command(
        "FT.SEARCH", "news_vector_idx",
        f"@published_at:[{from_timestamp} +inf]",
        f"=>[KNN 5 @embedding $query AS score]",
        "PARAMS", 2, "query", query_embedding,
        "SORTBY", "score", "ASC",
        "LIMIT", 0, top,
        "RETURN", 3, "title", "published_at", "score",
    )

Below are example results for articles similar to Bitcoin Crash:

1. b'article::1245195', [b'published_at', b'1737700000', b'title', b'"Bitcoin Crash: Investors Lose Confidence"', b'score', b'0.232883'],
2. b'article::1245194', [b'published_at', b'1737800000', b'title', b'"Bitcoin Plummets 20% After Market Sell-Off"', b'score', b'0.436184']

Here are some example results for articles similar to Ethereum Rally:

1. b'article::1245197', [b'published_at', b'1737800000', b'title', b'"Ethereum 2.0 Launches Successfully"', b'score', b'0.446037'],
2. b'article::1245187', [b'published_at', b'1737890000', b'title', b'"Ethereum Upgrade Brings New Scalability Features"', b'score', b'0.466666']

Predicting Cryptocurrency Growth

We have explored multiple ways to aggregate and analyze crypto news. Now, we will leverage this data to predict whether a specific cryptocurrency will experience growth or decline over a given period. Of course, news should not be the only factor used to analyze price movements. However, it serves as a valuable signal, influencing market sentiment and providing insights into potential trends. Again, it is notable that our blog posts do not provide investment advice on cryptocurrencies or any other assets.

Computing Features from News

Before making predictions, we need to extract key features from the news for each cryptocurrency. These features are calculated over a given time frame. For example, when predicting price movements, we can consider the total number of mentions, total positive and negative votes, and mean positive and negative votes. Here’s how we can compute these features:

def compute_features(client, currency, from_timestamp):
    recent_mensions = await get_cryptocurrency_recent_mentions(
        client, currency, from_timestamp, 1000
    )
    count_news = float(recent_mensions[0])
    votes = await get_cryptocurrency_votes(client, currency, from_timestamp)
    votes_dict = to_dict(votes)
    sum_positive = float(votes_dict.get("total_positive_votes", 0))
    sum_negative = float(votes_dict.get("total_negative_votes", 0))
    mean_positive = sum_positive / count_news if count_news > 0.0 else 0.0
    mean_negative = sum_negative / count_news if count_news > 0.0 else 0.0
    features = [
        count_news,
        sum_positive,
        sum_negative,
        mean_positive,
        mean_negative,
    ]  # etc.
    return np.array(features, dtype=np.float32)

Storing Historical Data

For each period of time, we store historical market conditions in Dragonfly, linking news features for that period to whether the cryptocurrency grew or declined. For example, in the case of weekly predictions, a stored historical record may look like this:

{
  "week": "2024-10-27",
  "currency_code": "BTC",
  "features": [10.0, 5.0, 2.0, 0.5, 0.2],
  "label": 1
}

This allows us to train our model by looking at past news sentiment and price movements.

Predicting Growth

Now that we have historical data, we can compute new features for the current period and attempt to predict whether the cryptocurrency will grow by returning 1 or decline by returning 0. There are multiple approaches to making predictions, such as machine learning models or statistical methods. As a simple example, we will use KNN search in Dragonfly to find historical periods with similar market conditions and base our prediction on them.

First, we need to create an index that enables vector similarity search:

client.execute_command(
    "FT.CREATE", "features_idx", "ON", "JSON",
    "PREFIX", 1, "week:",
    "SCHEMA",
    "$.currency_code", "AS", "currency_code", "TAG",
    "$.features", "AS", "features", "VECTOR", "FLAT", 6, "TYPE", "FLOAT32",
    "DIM", 5, "DISTANCE_METRIC", "COSINE",
    "$.label", "AS", "label", "NUMERIC",
)

Now, we can compute features for the current week and predict whether the cryptocurrency will rise or fall by finding the closest historical matches:

def predict_growth(client, currency, from_timestamp, k=5):
    new_features = await compute_features(client, currency, from_timestamp)
    query_vector = new_features.tobytes()
    search_result = client.execute_command(
        "FT.SEARCH", "features_idx", "*",
        f"@currency_code:{{{currency}}} =>[KNN {k} @features $query_vec AS score]",
        "PARAMS", "2", "query_vec", query_vector,
    )

    results = parse_search_result(search_result)

    knn_labels = [doc["label"] for doc in results]
    return 1 if sum(knn_labels) > len(knn_labels) / 2 else 0

This method does not replace full-scale trading analysis, but it does provide valuable insights based on news sentiment. The example shown here is a basic demonstration of how this approach can be used for prediction. Depending on your strategy, you will need to customize data aggregation and processing to align with your specific needs.

Unlocking Advanced News Analysis with Dragonfly

There are many different ways to optimize your crypto data processing. With Dragonfly’s vector search functionality, you can:

Identify clusters of articles covering the same event from multiple news outlets.
Eliminate near duplicates that might share extremely similar text.
Monitor how often a specific topic appears in different sources by combining semantic matching with the rest of your analytics.

Dragonfly offers a wide range of functionality to support various strategies while also being easy to set up, making it a powerful tool for making better decisions based on current market sentiment. Its straightforward setup and swift performance enable traders and developers to work with large datasets in real time, leading to faster, more informed decisions.

Beyond crypto, Dragonfly’s adaptability extends to other data-intensive scenarios, such as e-commerce platforms and social media analysis. For organizations seeking a cutting-edge solution for real-time data processing, Dragonfly is an indispensable tool.

Ready to unlock the full potential of Dragonfly? Try it today and experience the difference it can make in your workflow.

Crypto News Aggregation with Dragonfly: A High-Performance Solution for Traders and Developers - Part 2

Exploring Advanced Capabilities with Vector Search

Adding Embeddings to Each Article

Creating an Index for Vector Search

Running a KNN Search

Predicting Cryptocurrency Growth

Computing Features from News

Storing Historical Data

Predicting Growth

Unlocking Advanced News Analysis with Dragonfly

Switch & save up to 80%

Crypto News Aggregation with Dragonfly: A High-Performance Solution for Traders and Developers - Part 2

Exploring Advanced Capabilities with Vector Search

Adding Embeddings to Each Article

Creating an Index for Vector Search

Running a KNN Search

Predicting Cryptocurrency Growth

Computing Features from News

Storing Historical Data

Predicting Growth

Unlocking Advanced News Analysis with Dragonfly

Stay up to date on all things Dragonfly

Switch & save up to 80%