📋 AI Review from DeepReviewer will be automatically processed
📋 AI Review from ZGCA will be automatically processed
The paper presents EPInformer, a transformer-based framework for predicting gene expression by explicitly modeling promoter–enhancer interactions. The architecture comprises: (i) a CNN-based sequence encoder (residual + dilated convolutions) to learn embeddings from promoter (2 kb around TSS) and putative enhancer sequences (open chromatin within 100 kb of the TSS), (ii) a fusion layer to integrate epigenomic signals (e.g., DNase, H3K27ac), distance, and optional HiC contacts with sequence embeddings, (iii) an interaction encoder with multi-head attention that focuses solely on promoter–enhancer interactions via masking (ignoring enhancer–enhancer), and (iv) a predictor that combines the promoter representation with genomic covariates (e.g., mRNA half-life features, promoter H3K27ac) to predict CAGE-seq or RNA-seq. The sequence encoder can be pre-trained on enhancer activity (geometric mean of H3K27ac and DNase). The paper evaluates multiple variants: EPInformer-promoter, EPInformer-PE (promoter+enhancer sequences+distance), EPInformer-PE-Activity (adds DNase/H3K27ac), and EPInformer-PE-Activity-HiC (adds HiC). Using 12-fold cross-chromosome validation, EPInformer-PE-Activity-HiC achieves high Pearson correlations for CAGE (K562: 0.875; GM12878: 0.891), outperforming Seq-GraphReg, and surpasses Xpresso for RNA-seq. On Enformer’s hold-out set, EPInformer variants exceed Enformer’s PearsonR by 7.3% (K562) and 9.1% (GM12878). Attention scores prioritize CRISPRi-FlowFISH validated enhancers better than ABC, distance, activity, and HiC; combining attention with ABC (attention-ABC) further improves AUPRC. In-silico perturbations at KLF1 correlate strongly with CRISPRi effects (PearsonR = 0.88). TF-MoDISco-lite and TangerMEME applied to attention-prioritized enhancers recover known, cell-type-specific TF motifs (e.g., GATA1/2 in K562; SPI1 in GM12878). The model is lightweight (447,149 parameters; ~0.2% of Enformer) and trains in ~1 hour on an A100 GPU.
Cross‑Modal Consistency: 30/50
Textual Logical Soundness: 22/30
Visual Aesthetics & Clarity: 13/20
Overall Score: 65/100
Detailed Evaluation (≤500 words):
1. Cross‑Modal Consistency
• Major 1: Model name inconsistent (EPInformer vs EPIformer) across title, text, and figures. Evidence: “EPInformer” (Title) vs “we introduce EPIformer” (Introduction).
• Major 2: Promoter window length contradicts across sections. Evidence: “promoter region (1‑kb)” (Results, Overview) vs “2 kb region surrounding the TSS” (Methods).
• Major 3: Attention masking description conflicts with formula/implementation. Evidence: “ignoring enhancer‑enhancer interactions” (Results) vs “mask vector M…for padding enhancers” (Methods, equation).
• Minor 1: Figure 2 panel lettering referenced as “Fig. 2g” in text but caption uses “f” for enhancer‑activity panel. Evidence: “(Fig. 2g)” (Results) vs “f, The pre‑trained sequence encoder…” (Fig. 2 caption).
• Minor 2: Inconsistent model abbreviation in captions (e.g., “EPIformer‑EP‑Activity”). Evidence: Fig. 3d caption text vs elsewhere “PE‑Activity”.
• Minor 3: Occasional capitalization/typo mismatches in panel labels (e.g., “C”).
2. Text Logic
• Major 1: Unclear/contradictory specification of how enhancer‑enhancer attention is masked; blocks understanding of interaction encoder behavior. Evidence: “ignoring enhancer‑enhancer interactions through attention masking” (Results) vs padding‑only mask (Methods).
• Minor 1: Seq‑GraphReg comparison mixes different CV schemes (10‑fold reported vs 12‑fold here), potentially confounding. Evidence: “referenced performance metrics from its original 10‑fold…”.
• Minor 2: Several small typos (EPIntformer, nonredundant/non‑redundant) but do not impede meaning.
3. Figure Quality
• Major 1: Fig. 3d locus browser panel is illegible at print size; many tracks/labels cannot be read, hindering verification. Evidence: Fig. 3d multi‑track view.
• Minor 1: Some axis labels/units are small or absent in bar charts (e.g., PearsonR not always on axis though values atop bars). Evidence: Fig. 2a–c.
• Minor 2: Fig. 1a contains dense small text in modules; harder to parse without caption. Evidence: Fig. 1a architecture diagram.
Key strengths:
Key weaknesses:
Recommended fixes (highest impact first):
📋 AI Review from SafeReviewer will be automatically processed
This paper introduces EPInformer, a novel deep learning framework designed to predict gene expression by explicitly modeling promoter-enhancer interactions. The core innovation lies in the use of a multi-head attention mechanism to capture these interactions, integrating promoter and enhancer sequences, epigenomic signals, and chromatin contacts. The authors propose that this explicit modeling approach improves upon existing methods that implicitly learn these relationships. EPInformer is structured into four main modules: a sequence encoder, a feature fusion layer, an interaction encoder, and a predictor module. The sequence encoder uses convolutional layers to generate embeddings for promoter and enhancer sequences. The feature fusion layer combines these embeddings with epigenomic data, such as histone modification signals, and chromatin contact information derived from Hi-C experiments. The interaction encoder, the heart of the model, employs multi-head attention to model the complex relationships between promoters and their associated enhancers. Finally, the predictor module uses the learned representations to predict gene expression levels. The authors emphasize the model's efficiency, claiming it requires only 0.2% of the parameters of Enformer, a state-of-the-art model for gene expression prediction. The empirical evaluation of EPInformer is conducted on two human cell lines, K562 and GM12878, using both CAGE-seq and RNA-seq data. The results demonstrate that EPInformer outperforms existing models like Enformer and Xpresso in predicting gene expression levels, as measured by Pearson correlation coefficients. The authors also show that EPInformer can effectively prioritize cell-type-specific enhancers using attention scores, outperforming the Activity-by-Contact (ABC) score in precision-recall analysis. Furthermore, they demonstrate the model's ability to identify important transcription factor motifs within enhancer regions. The authors claim that EPInformer's ability to explicitly model promoter-enhancer interactions, combined with its computational efficiency, makes it a valuable tool for understanding gene regulation and identifying key regulatory elements. They also suggest that the model's interpretability, through attention scores and motif discovery, provides insights into the mechanisms underlying gene expression. The paper concludes by highlighting the potential of EPInformer for advancing our understanding of gene regulation and its implications for biological research and medicine. However, the authors acknowledge that further validation on additional cell lines and datasets is needed to fully establish the model's generalizability and robustness. They also suggest that future work could explore the application of EPInformer to other regulatory elements and the integration of additional data types to further improve its predictive power. Overall, the authors position EPInformer as a significant advancement in the field of gene expression prediction, offering a more accurate and efficient approach to modeling the complex regulatory landscape of the genome.
EPInformer introduces a novel approach to gene expression prediction by explicitly modeling promoter-enhancer interactions using a multi-head attention mechanism. This is a significant departure from previous models that often implicitly learn these relationships, and it allows for a more direct and interpretable representation of the regulatory landscape. The use of multi-head attention in the interaction encoder is a key technical innovation, enabling the model to capture complex relationships between promoters and their associated enhancers. The model's architecture is well-motivated, integrating promoter and enhancer sequences, epigenomic signals, and chromatin contacts in a coherent framework. The empirical results demonstrate that EPInformer outperforms existing models, such as Enformer and Xpresso, in predicting gene expression levels in two human cell lines, K562 and GM12878. The authors report Pearson correlation coefficients, showing that EPInformer achieves higher agreement between predicted and observed expression levels. This performance improvement is particularly notable when incorporating HiC data, highlighting the importance of chromatin contacts in gene regulation. The authors also demonstrate that EPInformer can effectively prioritize cell-type-specific enhancers using attention scores. The precision-recall analysis shows that EPInformer's attention scores outperform the Activity-by-Contact (ABC) score, suggesting that the model is capturing biologically relevant interactions. Furthermore, the ability to identify important transcription factor motifs within enhancer regions is a valuable contribution. The authors use the model to discover motifs for known regulators, providing insights into the mechanisms underlying gene expression. The model's efficiency is another significant strength. The authors claim that EPInformer requires only 0.2% of the parameters of Enformer, making it more computationally tractable and accessible to researchers with limited resources. This efficiency does not come at the cost of performance, as EPInformer achieves comparable or better results than Enformer. The paper is generally well-written and easy to follow. The authors clearly explain the model architecture, the experimental setup, and the results. The figures and tables are informative and help to illustrate the key findings. The authors also provide a detailed description of the data preprocessing steps and the model training procedure, enhancing the reproducibility of the results. The use of both CAGE-seq and RNA-seq data for evaluation is a strength, as it demonstrates the model's ability to predict different aspects of gene expression. CAGE-seq measures transcription start sites, while RNA-seq measures steady-state mRNA abundance, providing a more comprehensive assessment of the model's performance. The authors also perform a 12-fold cross-chromosome validation, which is a rigorous evaluation strategy that helps to ensure the model's generalizability. This validation approach is more robust than a simple train/test split and provides more confidence in the model's performance. The paper's focus on interpretability is also a strength. The authors demonstrate how attention scores can be used to identify important enhancers, and how the model can be used to discover transcription factor motifs. This interpretability is crucial for understanding the biological mechanisms underlying gene expression and for identifying key regulatory elements.
Despite the strengths of EPInformer, several weaknesses need to be addressed. First, the evaluation of EPInformer is limited to two cell lines, K562 and GM12878. While the results are promising, it is unclear how well the model generalizes to other cell types. The authors acknowledge this limitation but do not provide a clear plan for addressing it. The lack of evaluation on a wider range of cell lines raises concerns about the model's robustness and applicability to diverse biological contexts. This is a significant limitation, as gene regulatory mechanisms can vary substantially across different cell types. The paper would benefit from a more thorough discussion of the potential challenges in generalizing EPInformer to other cell types, including the availability of epigenomic and chromatin contact data. (Confidence: High) Second, the comparison with existing models is not entirely fair. EPInformer uses HiC data, while the baseline models, Enformer and Xpresso, do not. This makes it difficult to isolate the contribution of the model architecture from the input features. The authors should have performed a more controlled comparison, where all models are trained with the same input features. For example, they could have trained Enformer and Xpresso with HiC data, if feasible, or trained EPInformer without HiC data to assess the impact of this feature. The current comparison does not allow for a clear understanding of whether the performance improvement is due to the model architecture or the inclusion of HiC data. This is a critical weakness that undermines the validity of the performance claims. (Confidence: High) Third, the paper lacks a detailed analysis of the model's performance on different types of genes. It is possible that EPInformer performs better on some gene categories than others, and this should be investigated. For example, the model might perform differently on housekeeping genes compared to tissue-specific genes, or on genes with different expression levels. The absence of such an analysis limits the understanding of the model's strengths and weaknesses and its potential biases. A more granular analysis of performance across different gene categories would provide a more complete picture of the model's capabilities. (Confidence: High) Fourth, the paper does not provide a detailed analysis of the computational cost of EPInformer. While the authors claim that it is efficient, a more rigorous analysis of training time, memory usage, and scalability is needed. The authors should have provided a comparison of the computational cost of EPInformer with other models, such as Enformer, to support their efficiency claims. This analysis should include details about the hardware used for training and inference, as well as the time and memory requirements for different input sizes. The lack of this information makes it difficult to assess the practical applicability of the model, especially for large-scale datasets. (Confidence: High) Fifth, the paper lacks a detailed discussion of the limitations of the model. It is important to acknowledge the potential biases in the training data and the challenges of applying the model to different cell types. The authors should have discussed the potential impact of these limitations on the model's performance and generalizability. For example, the model might be biased towards cell types with more available data, or it might not perform well on cell types with different regulatory mechanisms. A more thorough discussion of these limitations would provide a more balanced perspective on the model's capabilities. (Confidence: High) Sixth, the paper does not provide a detailed analysis of the model's interpretability. While the authors demonstrate the ability to identify important transcription factor motifs, a more in-depth analysis of the attention weights and their relationship to gene expression is needed. The authors should have investigated whether the attention weights correlate with known regulatory relationships, and whether they can be used to predict the impact of perturbations on gene expression. A more thorough analysis of the model's interpretability would increase confidence in its biological relevance. (Confidence: High) Finally, the paper lacks a detailed discussion of the potential applications of EPInformer in biological research and medicine. The authors should have provided concrete examples of how the model can be used to advance our understanding of gene regulation and disease. For example, they could have discussed how EPInformer can be used to identify novel regulatory elements, predict the impact of genetic variants on gene expression, or develop new therapeutic strategies. A more detailed discussion of the potential applications would highlight the practical value of the model and its potential impact on the field. (Confidence: High)
To address the identified weaknesses, I recommend several concrete improvements. First, the authors should evaluate EPInformer on a wider range of cell lines to assess its generalizability. This should include cell lines with varying levels of epigenomic data availability. The authors should also investigate the model's performance on different types of genes, such as housekeeping genes, tissue-specific genes, and genes with different expression levels. This analysis would provide a more complete picture of the model's strengths and weaknesses and its potential biases. Second, the authors should perform a more controlled comparison with existing models, where all models are trained with the same input features. This would involve either training Enformer and Xpresso with HiC data, if feasible, or training EPInformer without HiC data. This would allow for a more accurate assessment of the contribution of the model architecture versus the input features. The authors should also provide a detailed analysis of the computational cost of EPInformer, including training time, memory usage, and scalability. This analysis should be compared to other models, such as Enformer, to support the efficiency claims. The authors should also discuss the limitations of the model in more detail. This should include a discussion of the potential biases in the training data, the challenges of applying the model to different cell types, and the potential impact of these limitations on the model's performance and generalizability. The authors should also provide a more in-depth analysis of the model's interpretability. This should include an investigation of whether the attention weights correlate with known regulatory relationships, and whether they can be used to predict the impact of perturbations on gene expression. The authors should also explore the use of other interpretability techniques, such as saliency maps or feature importance scores. Furthermore, the authors should provide a more detailed discussion of the potential applications of EPInformer in biological research and medicine. This should include concrete examples of how the model can be used to advance our understanding of gene regulation and disease. For example, the authors could discuss how EPInformer can be used to identify novel regulatory elements, predict the impact of genetic variants on gene expression, or develop new therapeutic strategies. The authors should also consider exploring the use of EPInformer for other regulatory elements beyond promoters and enhancers, such as insulators or boundaries. This would broaden the scope of the model and increase its potential impact. Additionally, the authors should consider integrating additional data types, such as transcription factor binding data or RNA velocity data, to further improve the model's predictive power and biological relevance. The authors should also consider developing a user-friendly interface or software package for EPInformer, which would make it more accessible to researchers who may not have extensive experience with deep learning. This would facilitate the adoption of the model and increase its impact on the field. Finally, the authors should consider performing ablation studies to assess the contribution of each input feature to the model's performance. This would help to identify the most important features and potentially lead to a more efficient model. The authors should also consider performing a more rigorous statistical analysis of the results, including reporting confidence intervals and p-values. This would increase the reliability of the findings and provide a more solid foundation for the conclusions.
Several key questions remain unanswered, and addressing these would significantly strengthen the paper. First, how does EPInformer perform on cell lines with different levels of epigenomic data availability? The current evaluation is limited to two cell lines with relatively comprehensive data. It is unclear how the model would perform on cell lines with less available data, or with data of lower quality. Second, how does the model's performance vary across different gene families or functional categories? Are there specific types of genes for which EPInformer performs particularly well or poorly? Understanding these differences would provide valuable insights into the model's strengths and limitations. Third, what is the computational cost of EPInformer compared to other models, and how does it scale with the size of the input data? While the authors claim that EPInformer is efficient, a more detailed analysis of training time, memory usage, and scalability is needed. Fourth, what are the potential biases in the training data, and how might these biases affect the model's performance? For example, is the model biased towards cell types with more available data, or towards genes with higher expression levels? Fifth, how can the attention weights be used to gain insights into the regulatory mechanisms underlying gene expression? Do the attention weights correlate with known regulatory relationships, and can they be used to predict the impact of perturbations on gene expression? Sixth, what are the most promising applications of EPInformer in biological research and medicine? The authors should provide concrete examples of how the model can be used to advance our understanding of gene regulation and disease. Seventh, how does EPInformer handle novel or previously uncharacterized enhancers? Does the model rely solely on known enhancer regions, or can it identify new regulatory elements? Eighth, how robust is EPInformer to noise or errors in the input data? How does the model's performance change when the input data is perturbed or contains missing values? Ninth, can the model be used to predict the effects of genetic variants on gene expression? If so, how accurate are these predictions, and how do they compare to other methods for variant effect prediction? Finally, what are the limitations of the current model, and what are the next steps for improving its performance and generalizability? The authors should discuss the potential challenges in applying EPInformer to different biological contexts and the future directions for research in this area.