Systematic five-year quintile backtest suggests ICE sentiment data signal from user-generated content on Reddit may serve as a differentiated input for equity strategies.
March 25, 2026
Reddit has become a noteworthy data source accessible to quantitative finance professionals. Millions of people — including retail investors — discuss, debate and dissect publicly traded companies, commodities and other instruments on the platform every day. Embedded in this user-generated content are market signals that warrant closer examination.
This report presents a five-year quintile backtest evaluating the predictive power of ICE signals and sentiment data from user-generated content on Reddit applied to a subset of U.S. large-cap equities comprising 480 of the largest publicly traded companies and representing approximately 80% of domestic market capitalization as of March 2026. The ICE signals and sentiment data product provides, for each Reddit post or comment in which a covered security is mentioned, three composite scores: negative, neutral and positive sentiment, each expressed as a value between 0 and 1, with the three scores summing to 1.
Alongside these per-mention sentiment scores, the product also captures mention volume, reflecting the frequency with which a given company is referenced in user-generated content. Using this data, we constructed a simple, systematic trading strategy and evaluated its performance over a five-year period.
This material is written with quantitative researchers in mind, particularly those with some experience building and testing trading strategies. It is designed to examine how the ICE signals and sentiment data can be incorporated into existing or new strategies.
The approach was straightforward: the first step was to compute the sentiment for each entity in a post or comment as the positive sentiment score minus the negative sentiment score. The strategy then ranked stocks daily on a cross-sectional basis using the daily sentiment change, standardized to zero mean and unit variance prior to quintile assignment.
Stocks were sorted into five equal-weight portfolios (Q1 through Q5) based on this ranking, with the highest z-score of sentiment change in the first quintile and the lowest z-score sentiment change in the fifth quintile. Positions were held intraday from open to close. The report evaluated signal quality across all five quintiles, with particular focus on whether an elevated Reddit signal predicted next-day outperformance.
It is important to note that this backtest approach has several key limitations, including, but not limited to, the assumption of no transaction costs, survivorship bias, as well as capacity and liquidity constraints.1
| Key performance metrics | |
|---|---|
| Q1 | CAGR 3.08% | Sharpe ratio: 0.28 | Sortino ratio: 0.41 |
| Q5 | CAGR 5.15% | Sharpe ratio: 0.42 | Sortino ratio: 0.62 |
| L/S spread (Q5–Q1) | CAGR: 1.85% | Sharpe ratio: 0.39 | max drawdown -7.06% |
| Signal consistency | Q5 consistently outperforms Q1, positive L/S spread in five of six calendar years |
Figure 1: Key performance metrics from October 2020 to September 2025. Note: refer to Glossary for definitions.
Full strategy design and universe construction details are provided in Section 1. Overall, the strategy is dollar-neutral by construction: Q5 is held long, Q1 is held short, and the primary performance measure is the Q5 minus Q1 spread. Quintile portfolio cumulative performance is examined in Section 2, followed by year-by-year long/short spread analysis in Section 3. Conclusions are drawn in Section 4, and a Glossary is provided in Section 5.
The strategy is constructed to help ensure results are meaningful and easily replicable.
| Design principle | Implementation |
|---|---|
| Backtest period | Oct 5, 2020 – Sep 30, 2025 (five years total, spanning six calendar years) |
| Reddit raw data | Approx. 957,000,000 posts and comments from all subreddits (for the universe of interest) |
| Reddit sentiment | Mean of daily sentiment for each stock calculated until midnight Eastern Time (ET) |
| Universe | 480 of the largest publicly traded U.S. stocks as of March 2026 covered in the Reddit dataset over the five-year period |
| Universe filter | Min. 60 daily Reddit mentions; max. 30% mention drop day over day |
| Returns | Open to close returns are used to exclude after-hours market activity |
| Signal | Sentiment daily change (z-score normalized, with first quintile containing the highest z-score of sentiment change and the fifth quintile the lowest z-score of sentiment change) |
| No look-ahead bias | Signal for day T used only for portfolio construction on T+1 |
| Daily cross-sectional ranking | Stocks ranked relative to peers on each day |
| Z-score normalization | Signals standardized to mean = 0 and standard deviation = 1 |
| Portfolio construction | Five equal-weight quintiles, daily rebalanced |
| Minimum stocks per day | 100 |
| Rebalance frequency | Daily |
| Transaction costs | Set to 0 bps per side |
Figure 2: Strategy design
The long-short spread portfolio (Q5–Q1) generated a 1.85% compound annual growth rate (CAGR) with a Sharpe ratio of 0.39 and a max drawdown of -7.06%, highlighting the signal's ability to discriminate between outperformers and underperformers in a market-neutral construct.
The separation between Q5 and Q1 was consistent with the long-short spread's positive performance in all years except 2025. Q1 and Q5 exhibited consistently higher turnover (~1.7x) compared to Q2 and Q4 (1.6x). Q3 had notably lower turnover (~1.3x), reflecting more stable holdings at the center.
Figure 3: Cumulative returns for each quintile, October 2020 to September 2025. Note: Q1 = high sentiment.
Figure 3 illustrates the growth of $1 invested in each quintile over the full evaluation period. Q5 & Q4 delivered the best absolute return while Q1, Q2 and Q3 persistently underperformed, suggesting the signal exhibits cross-sectional predictive characteristics within the backtest framework.
Figure 4: Line chart: Q5–Q1 long-short spread on a calendar-year basis. Note: All charts are indexed to begin at 1.0, but Y axis scale is adjusted in each chart to capture the spread throughout the year. Years 2020 and 2025 are incomplete due to the availability of history at the time of writing.
The systematic strategy on ICE signals and sentiment data within Reddit user-generated content delivered its strongest performance in 2021, peaking near $1.06. The spread was stable and positive, with 2024 trending toward $1.04 while 2025 saw a drawdown where sentiment signals underperformed. Although it is impossible to verify the precise cause, the drawdown aligns with the period leading to April 2025 and the market volatility coinciding with the imposition of U.S. trade tariffs.
Figure 5: Bar chart: Q5–Q1 long-short spread on a calendar-year basis. Note: Years 2020 and 2025 are incomplete due to the availability of history at the time of writing
Figure 5 shows a generally consistent positive spread throughout the five-year backtest window, with the Q5-Q1 spread delivering gains in five out of the six periods analyzed (2025 being the exception, as mentioned above), averaging roughly +2% annually, with a strong +4.52% in 2022. Stability in the spreads grew stronger between 2021-2024, with compounding spread curves across quintiles showing a more pronounced signal in 2024-2025 compared to the 2020-2023 period.
Over five years of backtesting, ICE sentiment data on Reddit user-generated content signal has demonstrated potential as a differentiated source of signal-driven return separation in a simulated environment.
Key takeaways:
Key limitations:
| Term | Definition |
|---|---|
| Backtest period | Historical date range used to simulate the strategy, construct portfolios, and measure performance |
| Z-score normalization | (Signal − mean) / standard deviation of signal; standardizes signals for cross-sectional comparison |
| Transaction costs | Implicit costs (bid-ask spread, market impact) plus explicit fees (taxes, commissions) incurred when rebalancing |
| Rebalance | Frequency at which portfolios are reconstituted based on updated signal rankings |
| Quintile | Grouping that divides a ranked (z-scored) dataset into five equal-sized buckets: Q1 = Highest 20%, Q2 = Next 20%, Q3 = Middle 20%, Q4 = Next 20%, Q5 = Lowest 20% |
| L/S spread (Q5–Q1) | Q5 return minus Q1 return for each business day; measures spread between highest and lowest quintile portfolios |
| Daily turnover | Formula: turnover[t] = |changes| / (0.5 × (|stocks[t-1]| + |stocks[t]|))| measure of daily trading activity |
| CAGR | Formula: (cumulative return)^(1/years) - 1 | compound annual growth rate over the evaluation window |
| Annualized vol | Formula: standard deviation (daily returns) × sqrt (252) | annualized using population standard deviation |
| Sharpe ratio | Formula: annualized return/ annualized vol| measures risk-adjusted performance on a total-risk basis |
| Annualized downside vol | Formula: sqrt(mean(min(r, 0)^2)) × sqrt(252)| annualized downside deviation |
| Sortino ratio | Formula: annualized return/ annualized downside vol| measures downside-risk-adjusted performance |
| Maximum drawdown (MaxDD) | Largest percentage decline from a portfolio’s highest peak to its lowest subsequent trough |
Wish to learn more about the backtest or ICE Market signals and sentiment data?
1. Backtest results are hypothetical and do not represent actual trading performance.
Full Disclaimer
This material contains information that is proprietary property of Intercontinental Exchange, Inc. and/or its affiliates (the "ICE Group"), and is not to be published, reproduced, copied, disclosed or used without the express written consent of ICE Group.
This material is provided for informational purposes only. The information contained herein is subject to change and does not constitute any form of warranty, representation, or undertaking. Nothing herein should in any way be deemed to alter the legal rights and obligations contained in agreements between ICE Group and their respective clients relating to any of the products or services described herein. Nothing herein is intended to constitute legal, tax, accounting, investment or other professional advice.
ICE Group makes no warranties whatsoever, either express or implied, as to merchantability, fitness for a particular purpose, or any other matter. Without limiting the foregoing, ICE Group makes no representation or warranty that any data or information supplied to or by it are complete or free from errors, omissions, or defects and nothing contained herein should constitute any form of warranty, representation, or undertaking.
ICE Data Services refers to a group of products and services offered by certain Intercontinental Exchange, Inc. (NYSE:ICE) companies and is the marketing name used for ICE Data Services, Inc. and its subsidiaries globally, including ICE Data Indices, LLC, ICE Data Pricing & Reference Data, LLC, ICE Data Services Europe Limited and ICE Data Services Australia Pty Ltd. ICE Data Services is also the marketing name used for ICE Data Derivatives, Inc., ICE Data Analytics, LLC, and certain other data products and services offered by other affiliates of Intercontinental Exchange, Inc. (NYSE:ICE).
Without limiting the foregoing general disclaimers, Signals and Sentiment Data is provided on an 'as is' and 'as available' basis, with all faults and at the user's own risk. Derived from third-party and user-generated sources, it may be affected by factors including participant biases, operational errors, system failures, price volatility, and potential market manipulation. Signals and Sentiment Data represents point-in-time outputs only. Certain historical data may be subject to periodic updates or revisions, including as a result of recalibration of models, enhancements or methodologies, or expended instrument coverage. Such updates may result in differences from previously provided data. Except as set forth in its agreements with third party providers, ICE Group does not undertake to update or revise it to reflect subsequently available information. ICE Group does not control, endorse, or independently verify Signals and Sentiment Data. Signals and Sentiment Data does not constitute investment advice, trading recommendations, or a guarantee of future performance, and may not be used in connection with political advertising or electioneering. Signals and Sentiment Data reflects participant sentiment and trading activity only and does not constitute evidence, validation, or endorsement of any allegation, assertion, or political outcome by ICE Group. Users are solely responsible for understanding the limitations of the Signals and Sentiment Data, determining its utility, and for any decisions made in reliance thereon.
Trademarks of ICE Group include: Intercontinental Exchange, ICE, ICE block design, NYSE, ICE Data Services, ICE Data and New York Stock Exchange. Information regarding additional trademarks and intellectual property rights of ICE Group is located at https://www.ice.com/privacy-security-center/terms-of-use. Other products, services, or company names mentioned herein are the property of, and may be the service mark or trademark of, their respective owners.
© 2026 Intercontinental Exchange, Inc.
Recent data suggests Reddit discussions may have preceded severe price declines in gold and silver exchange-traded funds in late January 2026.
Read moreA demonstrated observable correlation between Reddit social metrics and cryptocurrency movements suggest that Reddit mention volumes and sentiment can serve as contemporaneous indicators of market activity.
Learn moreA new generation of financial market data, signals and insight is now available through ICE.
View more