Your browser is unsupported

Please visit this URL to review a list of supported browsers.

Home/ICE Insights/Backtesting a Reddit-derived strategy using ICE signals and sentiment data

Backtesting a Reddit-derived strategy using ICE signals and sentiment data

Systematic five-year quintile backtest suggests ICE sentiment data signal from user-generated content on Reddit may serve as a differentiated input for equity strategies.

March 25, 2026

Share

Reddit has become a noteworthy data source accessible to quantitative finance professionals. Millions of people — including retail investors — discuss, debate and dissect publicly traded companies, commodities and other instruments on the platform every day. Embedded in this user-generated content are market signals that warrant closer examination.

This report presents a five-year quintile backtest evaluating the predictive power of ICE signals and sentiment data from user-generated content on Reddit applied to a subset of U.S. large-cap equities comprising 480 of the largest publicly traded companies and representing approximately 80% of domestic market capitalization as of March 2026. The ICE signals and sentiment data product provides, for each Reddit post or comment in which a covered security is mentioned, three composite scores: negative, neutral and positive sentiment, each expressed as a value between 0 and 1, with the three scores summing to 1.

Alongside these per-mention sentiment scores, the product also captures mention volume, reflecting the frequency with which a given company is referenced in user-generated content. Using this data, we constructed a simple, systematic trading strategy and evaluated its performance over a five-year period.

This material is written with quantitative researchers in mind, particularly those with some experience building and testing trading strategies. It is designed to examine how the ICE signals and sentiment data can be incorporated into existing or new strategies.

The approach was straightforward: the first step was to compute the sentiment for each entity in a post or comment as the positive sentiment score minus the negative sentiment score. The strategy then ranked stocks daily on a cross-sectional basis using the daily sentiment change, standardized to zero mean and unit variance prior to quintile assignment.

Stocks were sorted into five equal-weight portfolios (Q1 through Q5) based on this ranking, with the highest z-score of sentiment change in the first quintile and the lowest z-score sentiment change in the fifth quintile. Positions were held intraday from open to close. The report evaluated signal quality across all five quintiles, with particular focus on whether an elevated Reddit signal predicted next-day outperformance.

It is important to note that this backtest approach has several key limitations, including, but not limited to, the assumption of no transaction costs, survivorship bias, as well as capacity and liquidity constraints.1

Key performance metrics
Q1CAGR 3.08% | Sharpe ratio: 0.28 | Sortino ratio: 0.41
Q5CAGR 5.15% | Sharpe ratio: 0.42 | Sortino ratio: 0.62
L/S spread (Q5–Q1)CAGR: 1.85% | Sharpe ratio: 0.39 | max drawdown -7.06%
Signal consistencyQ5 consistently outperforms Q1, positive L/S spread in five of six calendar years

Figure 1: Key performance metrics from October 2020 to September 2025. Note: refer to Glossary for definitions.

Full strategy design and universe construction details are provided in Section 1. Overall, the strategy is dollar-neutral by construction: Q5 is held long, Q1 is held short, and the primary performance measure is the Q5 minus Q1 spread. Quintile portfolio cumulative performance is examined in Section 2, followed by year-by-year long/short spread analysis in Section 3. Conclusions are drawn in Section 4, and a Glossary is provided in Section 5.

1. Strategy design and universe construction

The strategy is constructed to help ensure results are meaningful and easily replicable.

Design principleImplementation
Backtest periodOct 5, 2020 – Sep 30, 2025 (five years total, spanning six calendar years)
Reddit raw dataApprox. 957,000,000 posts and comments from all subreddits (for the universe of interest)
Reddit sentimentMean of daily sentiment for each stock calculated until midnight Eastern Time (ET)
Universe480 of the largest publicly traded U.S. stocks as of March 2026 covered in the Reddit dataset over the five-year period
Universe filterMin. 60 daily Reddit mentions; max. 30% mention drop day over day
ReturnsOpen to close returns are used to exclude after-hours market activity
SignalSentiment daily change (z-score normalized, with first quintile containing the highest z-score of sentiment change and the fifth quintile the lowest z-score of sentiment change)
No look-ahead biasSignal for day T used only for portfolio construction on T+1
Daily cross-sectional rankingStocks ranked relative to peers on each day
Z-score normalizationSignals standardized to mean = 0 and standard deviation = 1
Portfolio constructionFive equal-weight quintiles, daily rebalanced
Minimum stocks per day100
Rebalance frequencyDaily
Transaction costsSet to 0 bps per side

Figure 2: Strategy design

2. Performance summary

The long-short spread portfolio (Q5–Q1) generated a 1.85% compound annual growth rate (CAGR) with a Sharpe ratio of 0.39 and a max drawdown of -7.06%, highlighting the signal's ability to discriminate between outperformers and underperformers in a market-neutral construct.

The separation between Q5 and Q1 was consistent with the long-short spread's positive performance in all years except 2025. Q1 and Q5 exhibited consistently higher turnover (~1.7x) compared to Q2 and Q4 (1.6x). Q3 had notably lower turnover (~1.3x), reflecting more stable holdings at the center.

Cumulative returns - quintiles Q1-Q5

Figure 3: Cumulative returns for each quintile, October 2020 to September 2025. Note: Q1 = high sentiment.

Figure 3 illustrates the growth of $1 invested in each quintile over the full evaluation period. Q5 & Q4 delivered the best absolute return while Q1, Q2 and Q3 persistently underperformed, suggesting the signal exhibits cross-sectional predictive characteristics within the backtest framework.

3. Year-by-year performance

Figure 4: Line chart: Q5–Q1 long-short spread on a calendar-year basis. Note: All charts are indexed to begin at 1.0, but Y axis scale is adjusted in each chart to capture the spread throughout the year. Years 2020 and 2025 are incomplete due to the availability of history at the time of writing.

The systematic strategy on ICE signals and sentiment data within Reddit user-generated content delivered its strongest performance in 2021, peaking near $1.06. The spread was stable and positive, with 2024 trending toward $1.04 while 2025 saw a drawdown where sentiment signals underperformed. Although it is impossible to verify the precise cause, the drawdown aligns with the period leading to April 2025 and the market volatility coinciding with the imposition of U.S. trade tariffs.

Yearly return - Q5-Q1 spread (compounded)

Figure 5: Bar chart: Q5–Q1 long-short spread on a calendar-year basis. Note: Years 2020 and 2025 are incomplete due to the availability of history at the time of writing

Figure 5 shows a generally consistent positive spread throughout the five-year backtest window, with the Q5-Q1 spread delivering gains in five out of the six periods analyzed (2025 being the exception, as mentioned above), averaging roughly +2% annually, with a strong +4.52% in 2022. Stability in the spreads grew stronger between 2021-2024, with compounding spread curves across quintiles showing a more pronounced signal in 2024-2025 compared to the 2020-2023 period.

4. Conclusion

Over five years of backtesting, ICE sentiment data on Reddit user-generated content signal has demonstrated potential as a differentiated source of signal-driven return separation in a simulated environment.

Key takeaways:

  • Cross-sectional discrimination: ordering from Q5 down to Q1 confirms the signal's ability to rank stocks with consistency across the backtest period — Q5 and Q4 outperformed while Q1, Q2, and Q3 persistently lagged.
  • Complimentary signal: signal can be combined with other strategies to diversify and uncorrelate returns.
  • Historical consistency: spread consistency across 2020–2024 suggests the signal is not an artifact of a single market event but reflects a recurring pattern in retail sentiment during the backtest period.

Key limitations:

  • Transaction costs not modeled: backtest assumes 0 bps trading costs. Daily rebalancing at this frequency would incur meaningful slippage, market impact, and commissions in live implementation.
  • Survivorship bias: universe uses stocks with at least 60 Reddit mentions, introducing a bias toward actively discussed companies, potentially overstating signal performance.
  • Capacity: equal-weighted portfolios in smaller names may face capacity and liquidity constraints at scale.

5. Glossary

TermDefinition
Backtest periodHistorical date range used to simulate the strategy, construct portfolios, and measure performance
Z-score normalization(Signal − mean) / standard deviation of signal; standardizes signals for cross-sectional comparison
Transaction costsImplicit costs (bid-ask spread, market impact) plus explicit fees (taxes, commissions) incurred when rebalancing
RebalanceFrequency at which portfolios are reconstituted based on updated signal rankings
QuintileGrouping that divides a ranked (z-scored) dataset into five equal-sized buckets: Q1 = Highest 20%, Q2 = Next 20%, Q3 = Middle 20%, Q4 = Next 20%, Q5 = Lowest 20%
L/S spread (Q5–Q1)Q5 return minus Q1 return for each business day; measures spread between highest and lowest quintile portfolios
Daily turnoverFormula: turnover[t] = |changes| / (0.5 × (|stocks[t-1]| + |stocks[t]|))| measure of daily trading activity
CAGRFormula: (cumulative return)^(1/years) - 1 | compound annual growth rate over the evaluation window
Annualized volFormula: standard deviation (daily returns) × sqrt (252) | annualized using population standard deviation
Sharpe ratioFormula: annualized return/ annualized vol| measures risk-adjusted performance on a total-risk basis
Annualized downside volFormula: sqrt(mean(min(r, 0)^2)) × sqrt(252)| annualized downside deviation
Sortino ratioFormula: annualized return/ annualized downside vol| measures downside-risk-adjusted performance
Maximum drawdown (MaxDD)Largest percentage decline from a portfolio’s highest peak to its lowest subsequent trough

Contact us with any questions you may have.

Wish to learn more about the backtest or ICE Market signals and sentiment data?

Contact us

1. Backtest results are hypothetical and do not represent actual trading performance.

Full Disclaimer
This material contains information that is proprietary property of Intercontinental Exchange, Inc. and/or its affiliates (the "ICE Group"), and is not to be published, reproduced, copied, disclosed or used without the express written consent of ICE Group.

This material is provided for informational purposes only. The information contained herein is subject to change and does not constitute any form of warranty, representation, or undertaking. Nothing herein should in any way be deemed to alter the legal rights and obligations contained in agreements between ICE Group and their respective clients relating to any of the products or services described herein. Nothing herein is intended to constitute legal, tax, accounting, investment or other professional advice.

ICE Group makes no warranties whatsoever, either express or implied, as to merchantability, fitness for a particular purpose, or any other matter. Without limiting the foregoing, ICE Group makes no representation or warranty that any data or information supplied to or by it are complete or free from errors, omissions, or defects and nothing contained herein should constitute any form of warranty, representation, or undertaking.

ICE Data Services refers to a group of products and services offered by certain Intercontinental Exchange, Inc. (NYSE:ICE) companies and is the marketing name used for ICE Data Services, Inc. and its subsidiaries globally, including ICE Data Indices, LLC, ICE Data Pricing & Reference Data, LLC, ICE Data Services Europe Limited and ICE Data Services Australia Pty Ltd. ICE Data Services is also the marketing name used for ICE Data Derivatives, Inc., ICE Data Analytics, LLC, and certain other data products and services offered by other affiliates of Intercontinental Exchange, Inc. (NYSE:ICE).

Without limiting the foregoing general disclaimers, Signals and Sentiment Data is provided on an 'as is' and 'as available' basis, with all faults and at the user's own risk. Derived from third-party and user-generated sources, it may be affected by factors including participant biases, operational errors, system failures, price volatility, and potential market manipulation. Signals and Sentiment Data represents point-in-time outputs only. Certain historical data may be subject to periodic updates or revisions, including as a result of recalibration of models, enhancements or methodologies, or expended instrument coverage. Such updates may result in differences from previously provided data. Except as set forth in its agreements with third party providers, ICE Group does not undertake to update or revise it to reflect subsequently available information. ICE Group does not control, endorse, or independently verify Signals and Sentiment Data. Signals and Sentiment Data does not constitute investment advice, trading recommendations, or a guarantee of future performance, and may not be used in connection with political advertising or electioneering. Signals and Sentiment Data reflects participant sentiment and trading activity only and does not constitute evidence, validation, or endorsement of any allegation, assertion, or political outcome by ICE Group. Users are solely responsible for understanding the limitations of the Signals and Sentiment Data, determining its utility, and for any decisions made in reliance thereon.

Trademarks of ICE Group include: Intercontinental Exchange, ICE, ICE block design, NYSE, ICE Data Services, ICE Data and New York Stock Exchange. Information regarding additional trademarks and intellectual property rights of ICE Group is located at https://www.ice.com/privacy-security-center/terms-of-use. Other products, services, or company names mentioned herein are the property of, and may be the service mark or trademark of, their respective owners.

© 2026 Intercontinental Exchange, Inc.

Related resources

  • Reddit signals and sentiment analysis: precious metals decline

    Recent data suggests Reddit discussions may have preceded severe price declines in gold and silver exchange-traded funds in late January 2026.

    Read more
  • Reddit signals and sentiment analysis: cryptocurrencies fall

    A demonstrated observable correlation between Reddit social metrics and cryptocurrency movements suggest that Reddit mention volumes and sentiment can serve as contemporaneous indicators of market activity.

    Learn more
  • Market signals and sentiment

    A new generation of financial market data, signals and insight is now available through ICE.

    View more