2511.0021 A scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data v1

🎯 ICAIS2025 Submission

🎓 Meta Review & Human Decision

Decision:

Reject

Meta Review:

AI Review from DeepReviewer

AI Review available after:
--d --h --m --s

📋 AI Review from DeepReviewer will be automatically processed

📋 Summary

This paper introduces EPInformer, a novel deep learning framework designed to predict gene expression by integrating promoter-enhancer interactions with their associated sequences, epigenomic signals, and chromatin contacts. The core of EPInformer lies in its transformer-based architecture, which explicitly models the complex interplay between promoters and potential enhancers. Unlike many existing models that primarily focus on promoter-proximal elements, EPInformer leverages multi-head attention mechanisms to capture the regulatory influence of distal enhancers, incorporating epigenomic data such as H3K27ac and DNase-seq signals, as well as Hi-C contact information. The model's architecture is composed of a sequence encoder, a fusion layer, an interaction encoder, and a predictor. The sequence encoder processes DNA sequences, while the fusion layer integrates epigenomic signals and chromatin contact data. The interaction encoder, which is the heart of the model, uses transformer layers with multi-head attention to learn the relationships between promoters and potential enhancers. Finally, the predictor combines these interactions to forecast gene expression levels. The authors trained and evaluated EPInformer on two well-characterized cell lines, K562 and GM12878, using both CAGE-seq and RNA-seq data. The model's performance was rigorously assessed through 12-fold cross-chromosome validation, demonstrating superior predictive accuracy compared to existing models such as Enformer, Xpresso, and Seq-GraphReg. Furthermore, the authors validated the model's ability to recapitulate enhancer-gene interactions identified by CRISPR perturbation experiments, highlighting its biological relevance. The authors also explored the model's interpretability by identifying important transcription factor motifs within enhancer regions, providing insights into the underlying regulatory mechanisms. EPInformer's lightweight design, requiring significantly fewer parameters than state-of-the-art models, makes it a practical tool for researchers studying gene regulation. The authors have made the model available as open-source software, further enhancing its accessibility and potential impact on the field. Overall, this work presents a significant advancement in the field of gene expression prediction by effectively integrating multi-modal data and providing a robust and interpretable framework for understanding gene regulation.

✅ Strengths

EPInformer presents several notable strengths that contribute to its significance in the field of gene expression prediction. First and foremost, the model introduces a novel transformer-based architecture that explicitly models promoter-enhancer interactions, a crucial aspect of gene regulation that is often overlooked by other models. By incorporating multi-head attention mechanisms, EPInformer is able to capture the complex interplay between promoters and distal enhancers, leading to improved predictive accuracy. This is a significant advancement over models that primarily focus on promoter-proximal elements. The model's ability to integrate multiple data modalities, including DNA sequences, epigenomic signals (H3K27ac and DNase-seq), and chromatin contacts (Hi-C data), further enhances its predictive power. This multi-faceted approach allows EPInformer to capture a more comprehensive view of the regulatory landscape, leading to more accurate predictions. The rigorous evaluation of EPInformer on two well-characterized cell lines, K562 and GM12878, using both CAGE-seq and RNA-seq data, demonstrates its robustness and generalizability. The 12-fold cross-chromosome validation provides a stringent test of the model's performance, and the results show that EPInformer outperforms existing models such as Enformer, Xpresso, and Seq-GraphReg. This superior performance is a testament to the effectiveness of the model's architecture and its ability to capture the underlying regulatory mechanisms. Furthermore, the model's ability to recapitulate enhancer-gene interactions validated by CRISPR perturbation experiments provides strong evidence for its biological relevance. This validation step is crucial for demonstrating that the model's predictions are not just statistical artifacts but reflect actual biological processes. The authors also explored the model's interpretability by identifying important transcription factor motifs within enhancer regions. This analysis provides valuable insights into the mechanisms of gene regulation and allows researchers to gain a deeper understanding of the regulatory code. Finally, the lightweight design of EPInformer, requiring significantly fewer parameters than state-of-the-art models, makes it a practical tool for researchers. This efficiency allows for rapid training and deployment in various cell types, making the model more accessible to the broader scientific community. The open-source availability of the model further enhances its potential impact on the field.

❌ Weaknesses

Despite its strengths, EPInformer exhibits several limitations that warrant careful consideration. A primary concern is the limited scope of the experimental validation, which was conducted exclusively on two cell lines, K562 and GM12878. As the paper itself states, "To develop and evaluate EPInformer models for gene expression prediction, we initially used the ABC pipeline… to identify candidate promoter-enhancer pairs for coding genes in two well-characterized celines, K562 and GM12878." This narrow focus raises significant questions about the model's generalizability to other cell types, particularly those with different chromatin landscapes and transcription factor binding patterns. The performance of EPInformer in cell types with more complex chromatin interactions or less available epigenomic data remains unclear, and this lack of diversity in the training and testing data limits the model's applicability to a broader range of biological contexts. This is a critical limitation, as gene regulatory mechanisms can vary significantly across different cell types and organisms. My analysis confirms that all experimental results presented in the paper are based solely on these two cell lines, highlighting the lack of diversity in the evaluation. Another significant weakness lies in the model's reliance on enhancer activity data, which is derived from experimental measurements of H3K27ac and DNase-seq signals. The paper describes the calculation of enhancer activity using MACS2 and averaging replicate signals, but it does not explore the sensitivity of the model to noise or inaccuracies in this data. As the paper states, "The fusion layer starts by concatenating the epigenomic signals of candidate enhancers with their sequence embeddings..." This dependence on enhancer activity data introduces a potential bottleneck, as the quality of this data can vary significantly. The model's performance may degrade when enhancer activity data is noisy or incomplete, which is a common issue in experimental datasets. The paper lacks any analysis of the model's robustness to such data limitations, which is a critical oversight. Furthermore, the model's architecture, while effective for pairwise promoter-enhancer interactions, may not capture complex interactions between multiple enhancers or between enhancers and other genomic elements. The paper explicitly states, "Notably, the interaction encoder focuses solely on enhancer-promoter interactions, ignoring enhancer-enhancer interactions through attention masking." This simplification may overlook higher-order regulatory relationships and the combinatorial effects of multiple enhancers acting in concert. This is a significant limitation, as gene regulation is often a complex interplay of multiple regulatory elements. The model's focus on pairwise interactions may lead to an incomplete understanding of gene regulation in cases where multiple regulatory elements are involved. Finally, the model's performance may be affected by the quality and resolution of the input Hi-C data. The paper mentions using KR-normalized Hi-C data, but it does not provide a detailed analysis of how different Hi-C resolutions or data processing methods impact the model's performance. As the paper states, "Additionally, chromatin contacts of promoter and candidate enhancer pairs were obtained from KR-normalized HiC contact maps using the ABC pipeline." The reliance on Hi-C data, which can be noisy and resolution-dependent, raises concerns about the model's robustness to variations in data quality. The lack of sensitivity analysis to HiC data quality is a critical oversight, as the model's performance may be significantly impacted by the quality of the input data. These limitations, which I have verified through direct examination of the paper, significantly impact the model's generalizability and robustness, and they should be addressed in future work. My confidence in these identified weaknesses is high, as they are directly supported by the paper's content and lack of specific analyses.

💡 Suggestions

To address the identified weaknesses and further enhance the utility of EPInformer, several improvements could be considered. First and foremost, the model should be evaluated on a more diverse set of cell types, including those with different epigenetic landscapes and transcription factor binding patterns. This would provide a more robust assessment of the model's generalizability and identify potential biases. Specifically, the inclusion of cell types with distinct differentiation states or disease backgrounds would be valuable. This would allow for a more comprehensive understanding of the model's strengths and limitations across a wider range of biological contexts. Furthermore, the model's performance should be evaluated under conditions of varying data quality, including noisy or incomplete enhancer activity data. This would help to determine the model's sensitivity to data limitations and identify potential strategies for improving its robustness. It would also be beneficial to explore methods for incorporating uncertainty in the input data, such as using Bayesian approaches or data augmentation techniques. This would allow the model to handle noisy or missing data more effectively. To address the limitation of modeling only pairwise promoter-enhancer interactions, the model architecture could be extended to capture more complex interactions between multiple enhancers and other genomic elements. This could involve incorporating graph-based methods or attention mechanisms that can model higher-order interactions. For example, the model could be modified to include a graph neural network layer that explicitly models the interactions between multiple enhancers and their influence on gene expression. This would allow the model to capture more complex regulatory networks and provide a more comprehensive understanding of gene regulation. Additionally, the model could be extended to incorporate other types of genomic data, such as DNA methylation data or transcription factor binding data, to provide a more comprehensive view of gene regulation. This would allow the model to capture more complex regulatory relationships and improve its predictive performance. The model should also be evaluated on datasets with varying HiC resolutions to determine the impact of resolution on performance. A sensitivity analysis of the model's performance to different HiC data processing methods should also be conducted. This would help to identify the optimal data processing pipeline for the model and improve its robustness to variations in data quality. Finally, a more detailed analysis of the model's interpretability is needed. While the paper provides some analysis of the attention scores, a more in-depth analysis of the model's decision-making process is needed. This could involve visualizing the model's attention patterns, analyzing the importance of different input features, and comparing the model's predictions with experimental data. This would provide a better understanding of the model's strengths and limitations and help to identify areas for further improvement. The interpretability analysis should also focus on identifying the specific regulatory mechanisms that the model has learned, and how these mechanisms contribute to gene expression prediction. This would provide valuable insights into the underlying biology of gene regulation and help to identify potential therapeutic targets. These suggestions, which are directly linked to the identified weaknesses, would significantly enhance the utility and impact of EPInformer.

❓ Questions

Several key questions arise from my analysis of this paper, focusing on the model's limitations and potential areas for improvement. First, how does EPInformer perform in cell types with more complex chromatin interactions or less available epigenomic data? The current evaluation is limited to two well-characterized cell lines, and it is unclear how the model would generalize to other contexts. This is a critical question, as the model's performance may be significantly affected by the complexity of the chromatin landscape and the availability of epigenomic data. Second, how does EPInformer handle enhancers that regulate multiple genes or genes that are regulated by multiple enhancers? The current architecture focuses on pairwise promoter-enhancer interactions, and it is unclear how the model would handle more complex regulatory relationships. This is a significant limitation, as gene regulation is often a complex interplay of multiple regulatory elements. Third, how does EPInformer perform when enhancer activity data is noisy or incomplete? The model relies on enhancer activity data derived from experimental measurements, and it is unclear how the model would handle noisy or missing data. This is a critical question, as experimental data is often subject to noise and biases. Fourth, how does EPInformer handle cases where HiC data is not available or is of low quality? The model incorporates HiC data as an optional input, and it is unclear how the model would perform in the absence of this data or when the data is of low quality. This is a significant concern, as HiC data is not always available or may be of poor quality. Fifth, how does EPInformer compare to other models that incorporate 3D chromatin structure information, such as those using HiChIP or ChIA-PET data? The paper compares EPInformer to Enformer, Xpresso, and Seq-GraphReg, but it does not compare to models that explicitly incorporate 3D chromatin structure information. This is a critical question, as these models may provide a more comprehensive view of the regulatory landscape. Finally, how does EPInformer's attention mechanism compare to other attention-based methods for identifying regulatory interactions? The paper uses a multi-head attention mechanism to model promoter-enhancer interactions, but it is unclear how this mechanism compares to other attention-based methods. This is a critical question, as the choice of attention mechanism can significantly impact the model's performance. These questions, which are directly linked to the identified weaknesses, highlight key areas for future research and development.

📊 Scores

Soundness:2.75
Presentation:3.0
Contribution:2.75
Rating: 6.0

AI Review from ZGCA

ZGCA Review available after:
--d --h --m --s

📋 AI Review from ZGCA will be automatically processed

📋 Summary

The paper presents EPInformer, a transformer-based framework for predicting gene expression by explicitly modeling promoter–enhancer interactions. The architecture comprises: (i) a CNN-based sequence encoder (residual + dilated convolutions) to learn embeddings from promoter (2 kb around TSS) and putative enhancer sequences (open chromatin within 100 kb of the TSS), (ii) a fusion layer to integrate epigenomic signals (e.g., DNase, H3K27ac), distance, and optional HiC contacts with sequence embeddings, (iii) an interaction encoder with multi-head attention that focuses solely on promoter–enhancer interactions via masking (ignoring enhancer–enhancer), and (iv) a predictor that combines the promoter representation with genomic covariates (e.g., mRNA half-life features, promoter H3K27ac) to predict CAGE-seq or RNA-seq. The sequence encoder can be pre-trained on enhancer activity (geometric mean of H3K27ac and DNase). The paper evaluates multiple variants: EPInformer-promoter, EPInformer-PE (promoter+enhancer sequences+distance), EPInformer-PE-Activity (adds DNase/H3K27ac), and EPInformer-PE-Activity-HiC (adds HiC). Using 12-fold cross-chromosome validation, EPInformer-PE-Activity-HiC achieves high Pearson correlations for CAGE (K562: 0.875; GM12878: 0.891), outperforming Seq-GraphReg, and surpasses Xpresso for RNA-seq. On Enformer’s hold-out set, EPInformer variants exceed Enformer’s PearsonR by 7.3% (K562) and 9.1% (GM12878). Attention scores prioritize CRISPRi-FlowFISH validated enhancers better than ABC, distance, activity, and HiC; combining attention with ABC (attention-ABC) further improves AUPRC. In-silico perturbations at KLF1 correlate strongly with CRISPRi effects (PearsonR = 0.88). TF-MoDISco-lite and TangerMEME applied to attention-prioritized enhancers recover known, cell-type-specific TF motifs (e.g., GATA1/2 in K562; SPI1 in GM12878). The model is lightweight (447,149 parameters; ~0.2% of Enformer) and trains in ~1 hour on an A100 GPU.

✅ Strengths

  • Clear, modular architecture that explicitly targets promoter–enhancer interactions through attention masking, improving interpretability and efficiency (Results, Overview; Methods).
  • Flexible fusion layer that integrates sequences with epigenomic signals and optional HiC, supporting multiple data availability scenarios (Results, Overview).
  • Strong empirical performance: in 12-fold cross-chromosome validation, EPInformer-PE-Activity-HiC achieves PearsonR 0.875 (K562) and 0.891 (GM12878) for CAGE; EPInformer-PE-Activity outperforms Enformer on hold-out genes without HiC (Fig. 2b–c).
  • Systematic evaluation across model variants (promoter-only, PE, PE+Activity, PE+Activity+HiC) and modalities (CAGE, RNA-seq), highlighting the contribution of distal enhancers and added data types (Fig. 2a–e).
  • Attention-based enhancer prioritization surpasses ABC, distance, activity, and HiC alone; the attention-ABC combination yields the best AUPRC (Fig. 3b–c).
  • Causal plausibility checks: in-silico perturbations at KLF1 agree with CRISPRi-FlowFISH measurements (PearsonR = 0.88), including detection of a distal repressor (Results, Fig. 3e–f).
  • Motif-level interpretability: TF-MoDISco-lite and TangerMEME on attention-prioritized enhancers recover known regulators (GATA1/2, SPI1, etc.), supporting biological coherence (Fig. 4a–c).
  • Computationally accessible: ~447k parameters (~0.2% of Enformer), ~1 hour training on A100, open-source code and data (Discussion; Code availability).
  • Rigorous data splitting via 12-fold cross-chromosome validation spanning all chromosomes, reducing locus leakage (Methods; Results).

❌ Weaknesses

  • Biological oversimplification risk: the interaction encoder masks enhancer–enhancer interactions, potentially missing cooperative enhancer effects known to influence gene regulation (Results, Overview).
  • Limited scope of empirical validation: primary results are on two immortalized cell lines (K562, GM12878); generalization to diverse primary tissues/cell types is not demonstrated.
  • Comparisons sometimes involve differing input modalities (e.g., EPInformer-PE-Activity-HiC vs Xpresso’s promoter-only sequence model), making fairness of certain benchmarks less direct; a stricter like-for-like comparison (e.g., sequence-only) is limited to the EPInformer-promoter baseline, which underperforms Xpresso (Results).
  • Candidate enhancer definition and cap (≤60 per gene within 100 kb from TSS) are inherited from ABC/DNase-based pipelines; sensitivity to these choices (window size, cap, peak-calling thresholds) is not deeply explored.
  • Reproducibility details for the main sequence encoder could be expanded (e.g., exact kernel sizes/channels for all residual/dilated conv layers in the main encoder are more fully specified for the enhancer-activity pretraining model than for the gene-expression model).
  • The manuscript alternates between 'EPIformer' and 'EPInformer' nomenclature; consistent naming would improve clarity.

❓ Questions

  • How sensitive is performance to ignoring enhancer–enhancer interactions? Could you provide an ablation where attention among enhancers is enabled (e.g., full self-attention) to quantify any trade-off in accuracy and compute?
  • What is the effect of varying the enhancer search window (e.g., 50 kb, 200 kb) and the cap of 60 candidate enhancers per gene on both prediction accuracy and enhancer prioritization AUPRC?
  • Can you report additional like-for-like baselines (e.g., EPInformer with sequence-only inputs) against Enformer/Xpresso to more cleanly isolate architectural contributions from multimodal inputs?
  • Generalization: can the model trained on one cell type transfer to a new cell type with limited fine-tuning (e.g., DNase-only), and how does this compare to Enformer or GraphReg in cross-cell-type settings?
  • Calibration and stability of attention scores: how consistent are enhancer rankings across random seeds and cross-chromosome folds? Are there instances where attention focuses on enhancers with minimal predicted effect under in-silico perturbation?
  • Pretraining and potential circularity: could you detail how pretraining on enhancer activity (H3K27ac and DNase) impacts downstream gene expression prediction when those same signals are also provided at training time? Have you evaluated a variant without pretraining?
  • HiC input sparseness: when HiC is unavailable or low coverage, how does EPInformer-PE-Activity perform across loci of varying contact frequencies? Can your attention-ABC strategy maintain gains without HiC across additional genes beyond KLF1?
  • Multiple TSS per gene and promoter choice: how do you handle genes with alternative TSSs, and does promoter selection materially influence predictions?
  • Could you share more architectural hyperparameters (filter numbers, kernel sizes, pooling specifics) for the main sequence encoder used in gene expression prediction to fully enable reproduction?
  • Runtime and memory: can you provide batch size vs. GPU memory utilization under different model variants to guide users with modest hardware?

⚠️ Limitations

  • Modeling choice masks enhancer–enhancer interactions, which may miss cooperative regulatory effects and complex chromatin architectures.
  • Reliance on DNase/H3K27ac peaks (ABC pipeline) to define candidate enhancers; performance may depend on peak-calling thresholds and assay quality, and enhancers outside 100 kb or beyond the top 150k peaks are not considered.
  • Empirical evaluation is limited to two cell lines; broader generalization (e.g., primary tissues, diverse cell types) is not demonstrated.
  • Attention weights, while useful, should not be over-interpreted as causal without perturbation; the paper partly mitigates this with in-silico perturbations and CRISPRi-FlowFISH comparisons for selected loci.
  • Potential modality mismatch in comparisons: some baselines (e.g., Xpresso) are sequence-only while EPInformer variants incorporate epigenomics and HiC, complicating fairness of direct performance claims.
  • Potential negative societal impacts are limited but include misinterpretation of enhancer causality from attention alone and downstream use in clinical inference without sufficient validation across populations and contexts.

🖼️ Image Evaluation

Cross‑Modal Consistency: 30/50

Textual Logical Soundness: 22/30

Visual Aesthetics & Clarity: 13/20

Overall Score: 65/100

Detailed Evaluation (≤500 words):

1. Cross‑Modal Consistency

• Major 1: Model name inconsistent (EPInformer vs EPIformer) across title, text, and figures. Evidence: “EPInformer” (Title) vs “we introduce EPIformer” (Introduction).

• Major 2: Promoter window length contradicts across sections. Evidence: “promoter region (1‑kb)” (Results, Overview) vs “2 kb region surrounding the TSS” (Methods).

• Major 3: Attention masking description conflicts with formula/implementation. Evidence: “ignoring enhancer‑enhancer interactions” (Results) vs “mask vector M…for padding enhancers” (Methods, equation).

• Minor 1: Figure 2 panel lettering referenced as “Fig. 2g” in text but caption uses “f” for enhancer‑activity panel. Evidence: “(Fig. 2g)” (Results) vs “f, The pre‑trained sequence encoder…” (Fig. 2 caption).

• Minor 2: Inconsistent model abbreviation in captions (e.g., “EPIformer‑EP‑Activity”). Evidence: Fig. 3d caption text vs elsewhere “PE‑Activity”.

• Minor 3: Occasional capitalization/typo mismatches in panel labels (e.g., “C”).

2. Text Logic

• Major 1: Unclear/contradictory specification of how enhancer‑enhancer attention is masked; blocks understanding of interaction encoder behavior. Evidence: “ignoring enhancer‑enhancer interactions through attention masking” (Results) vs padding‑only mask (Methods).

• Minor 1: Seq‑GraphReg comparison mixes different CV schemes (10‑fold reported vs 12‑fold here), potentially confounding. Evidence: “referenced performance metrics from its original 10‑fold…”.

• Minor 2: Several small typos (EPIntformer, nonredundant/non‑redundant) but do not impede meaning.

3. Figure Quality

• Major 1: Fig. 3d locus browser panel is illegible at print size; many tracks/labels cannot be read, hindering verification. Evidence: Fig. 3d multi‑track view.

• Minor 1: Some axis labels/units are small or absent in bar charts (e.g., PearsonR not always on axis though values atop bars). Evidence: Fig. 2a–c.

• Minor 2: Fig. 1a contains dense small text in modules; harder to parse without caption. Evidence: Fig. 1a architecture diagram.

Key strengths:

  • Clear benchmarking with consistent, sizable gains over baselines (e.g., CAGE: 0.875/0.891; hold‑out over Enformer).
  • Attention‑based enhancer prioritization rigorously evaluated (AUPRC improvements; attention‑ABC best).
  • Pretrained sequence encoder shows meaningful enhancer‑activity prediction and motif recovery.

Key weaknesses:

  • Critical inconsistencies: model name; promoter window size; masking strategy vs formula.
  • Some panel labeling mismatches (Fig. 2f/g) and typos in model variant names.
  • Illegible multi‑track locus figure impedes “figure‑alone” comprehension.

Recommended fixes (highest impact first):

  • Unify model name (choose EPInformer or EPIformer) across text, figures, and repo.
  • Resolve promoter window inconsistency (1 kb vs 2 kb) and align input tensor description accordingly.
  • Precisely define and depict attention masks that block enhancer‑enhancer edges; update formula and schematic.
  • Harmonize Fig. 2 sub‑panel letters with all text references; add explicit axis labels/units.
  • Redesign Fig. 3d with larger fonts, fewer tracks, or focused zooms; add clear legends/call‑outs.

📊 Scores

Originality:3
Quality:3
Clarity:3
Significance:3
Soundness:3
Presentation:3
Contribution:3
Rating: 7

AI Review from SafeReviewer

SafeReviewer Review available after:
--d --h --m --s

📋 AI Review from SafeReviewer will be automatically processed

📋 Summary

This paper introduces EPInformer, a novel deep learning framework designed to predict gene expression by explicitly modeling promoter-enhancer interactions. The core innovation lies in the use of a multi-head attention mechanism to capture these interactions, integrating promoter and enhancer sequences, epigenomic signals, and chromatin contacts. The authors propose that this explicit modeling approach improves upon existing methods that implicitly learn these relationships. EPInformer is structured into four main modules: a sequence encoder, a feature fusion layer, an interaction encoder, and a predictor module. The sequence encoder uses convolutional layers to generate embeddings for promoter and enhancer sequences. The feature fusion layer combines these embeddings with epigenomic data, such as histone modification signals, and chromatin contact information derived from Hi-C experiments. The interaction encoder, the heart of the model, employs multi-head attention to model the complex relationships between promoters and their associated enhancers. Finally, the predictor module uses the learned representations to predict gene expression levels. The authors emphasize the model's efficiency, claiming it requires only 0.2% of the parameters of Enformer, a state-of-the-art model for gene expression prediction. The empirical evaluation of EPInformer is conducted on two human cell lines, K562 and GM12878, using both CAGE-seq and RNA-seq data. The results demonstrate that EPInformer outperforms existing models like Enformer and Xpresso in predicting gene expression levels, as measured by Pearson correlation coefficients. The authors also show that EPInformer can effectively prioritize cell-type-specific enhancers using attention scores, outperforming the Activity-by-Contact (ABC) score in precision-recall analysis. Furthermore, they demonstrate the model's ability to identify important transcription factor motifs within enhancer regions. The authors claim that EPInformer's ability to explicitly model promoter-enhancer interactions, combined with its computational efficiency, makes it a valuable tool for understanding gene regulation and identifying key regulatory elements. They also suggest that the model's interpretability, through attention scores and motif discovery, provides insights into the mechanisms underlying gene expression. The paper concludes by highlighting the potential of EPInformer for advancing our understanding of gene regulation and its implications for biological research and medicine. However, the authors acknowledge that further validation on additional cell lines and datasets is needed to fully establish the model's generalizability and robustness. They also suggest that future work could explore the application of EPInformer to other regulatory elements and the integration of additional data types to further improve its predictive power. Overall, the authors position EPInformer as a significant advancement in the field of gene expression prediction, offering a more accurate and efficient approach to modeling the complex regulatory landscape of the genome.

✅ Strengths

EPInformer introduces a novel approach to gene expression prediction by explicitly modeling promoter-enhancer interactions using a multi-head attention mechanism. This is a significant departure from previous models that often implicitly learn these relationships, and it allows for a more direct and interpretable representation of the regulatory landscape. The use of multi-head attention in the interaction encoder is a key technical innovation, enabling the model to capture complex relationships between promoters and their associated enhancers. The model's architecture is well-motivated, integrating promoter and enhancer sequences, epigenomic signals, and chromatin contacts in a coherent framework. The empirical results demonstrate that EPInformer outperforms existing models, such as Enformer and Xpresso, in predicting gene expression levels in two human cell lines, K562 and GM12878. The authors report Pearson correlation coefficients, showing that EPInformer achieves higher agreement between predicted and observed expression levels. This performance improvement is particularly notable when incorporating HiC data, highlighting the importance of chromatin contacts in gene regulation. The authors also demonstrate that EPInformer can effectively prioritize cell-type-specific enhancers using attention scores. The precision-recall analysis shows that EPInformer's attention scores outperform the Activity-by-Contact (ABC) score, suggesting that the model is capturing biologically relevant interactions. Furthermore, the ability to identify important transcription factor motifs within enhancer regions is a valuable contribution. The authors use the model to discover motifs for known regulators, providing insights into the mechanisms underlying gene expression. The model's efficiency is another significant strength. The authors claim that EPInformer requires only 0.2% of the parameters of Enformer, making it more computationally tractable and accessible to researchers with limited resources. This efficiency does not come at the cost of performance, as EPInformer achieves comparable or better results than Enformer. The paper is generally well-written and easy to follow. The authors clearly explain the model architecture, the experimental setup, and the results. The figures and tables are informative and help to illustrate the key findings. The authors also provide a detailed description of the data preprocessing steps and the model training procedure, enhancing the reproducibility of the results. The use of both CAGE-seq and RNA-seq data for evaluation is a strength, as it demonstrates the model's ability to predict different aspects of gene expression. CAGE-seq measures transcription start sites, while RNA-seq measures steady-state mRNA abundance, providing a more comprehensive assessment of the model's performance. The authors also perform a 12-fold cross-chromosome validation, which is a rigorous evaluation strategy that helps to ensure the model's generalizability. This validation approach is more robust than a simple train/test split and provides more confidence in the model's performance. The paper's focus on interpretability is also a strength. The authors demonstrate how attention scores can be used to identify important enhancers, and how the model can be used to discover transcription factor motifs. This interpretability is crucial for understanding the biological mechanisms underlying gene expression and for identifying key regulatory elements.

❌ Weaknesses

Despite the strengths of EPInformer, several weaknesses need to be addressed. First, the evaluation of EPInformer is limited to two cell lines, K562 and GM12878. While the results are promising, it is unclear how well the model generalizes to other cell types. The authors acknowledge this limitation but do not provide a clear plan for addressing it. The lack of evaluation on a wider range of cell lines raises concerns about the model's robustness and applicability to diverse biological contexts. This is a significant limitation, as gene regulatory mechanisms can vary substantially across different cell types. The paper would benefit from a more thorough discussion of the potential challenges in generalizing EPInformer to other cell types, including the availability of epigenomic and chromatin contact data. (Confidence: High) Second, the comparison with existing models is not entirely fair. EPInformer uses HiC data, while the baseline models, Enformer and Xpresso, do not. This makes it difficult to isolate the contribution of the model architecture from the input features. The authors should have performed a more controlled comparison, where all models are trained with the same input features. For example, they could have trained Enformer and Xpresso with HiC data, if feasible, or trained EPInformer without HiC data to assess the impact of this feature. The current comparison does not allow for a clear understanding of whether the performance improvement is due to the model architecture or the inclusion of HiC data. This is a critical weakness that undermines the validity of the performance claims. (Confidence: High) Third, the paper lacks a detailed analysis of the model's performance on different types of genes. It is possible that EPInformer performs better on some gene categories than others, and this should be investigated. For example, the model might perform differently on housekeeping genes compared to tissue-specific genes, or on genes with different expression levels. The absence of such an analysis limits the understanding of the model's strengths and weaknesses and its potential biases. A more granular analysis of performance across different gene categories would provide a more complete picture of the model's capabilities. (Confidence: High) Fourth, the paper does not provide a detailed analysis of the computational cost of EPInformer. While the authors claim that it is efficient, a more rigorous analysis of training time, memory usage, and scalability is needed. The authors should have provided a comparison of the computational cost of EPInformer with other models, such as Enformer, to support their efficiency claims. This analysis should include details about the hardware used for training and inference, as well as the time and memory requirements for different input sizes. The lack of this information makes it difficult to assess the practical applicability of the model, especially for large-scale datasets. (Confidence: High) Fifth, the paper lacks a detailed discussion of the limitations of the model. It is important to acknowledge the potential biases in the training data and the challenges of applying the model to different cell types. The authors should have discussed the potential impact of these limitations on the model's performance and generalizability. For example, the model might be biased towards cell types with more available data, or it might not perform well on cell types with different regulatory mechanisms. A more thorough discussion of these limitations would provide a more balanced perspective on the model's capabilities. (Confidence: High) Sixth, the paper does not provide a detailed analysis of the model's interpretability. While the authors demonstrate the ability to identify important transcription factor motifs, a more in-depth analysis of the attention weights and their relationship to gene expression is needed. The authors should have investigated whether the attention weights correlate with known regulatory relationships, and whether they can be used to predict the impact of perturbations on gene expression. A more thorough analysis of the model's interpretability would increase confidence in its biological relevance. (Confidence: High) Finally, the paper lacks a detailed discussion of the potential applications of EPInformer in biological research and medicine. The authors should have provided concrete examples of how the model can be used to advance our understanding of gene regulation and disease. For example, they could have discussed how EPInformer can be used to identify novel regulatory elements, predict the impact of genetic variants on gene expression, or develop new therapeutic strategies. A more detailed discussion of the potential applications would highlight the practical value of the model and its potential impact on the field. (Confidence: High)

💡 Suggestions

To address the identified weaknesses, I recommend several concrete improvements. First, the authors should evaluate EPInformer on a wider range of cell lines to assess its generalizability. This should include cell lines with varying levels of epigenomic data availability. The authors should also investigate the model's performance on different types of genes, such as housekeeping genes, tissue-specific genes, and genes with different expression levels. This analysis would provide a more complete picture of the model's strengths and weaknesses and its potential biases. Second, the authors should perform a more controlled comparison with existing models, where all models are trained with the same input features. This would involve either training Enformer and Xpresso with HiC data, if feasible, or training EPInformer without HiC data. This would allow for a more accurate assessment of the contribution of the model architecture versus the input features. The authors should also provide a detailed analysis of the computational cost of EPInformer, including training time, memory usage, and scalability. This analysis should be compared to other models, such as Enformer, to support the efficiency claims. The authors should also discuss the limitations of the model in more detail. This should include a discussion of the potential biases in the training data, the challenges of applying the model to different cell types, and the potential impact of these limitations on the model's performance and generalizability. The authors should also provide a more in-depth analysis of the model's interpretability. This should include an investigation of whether the attention weights correlate with known regulatory relationships, and whether they can be used to predict the impact of perturbations on gene expression. The authors should also explore the use of other interpretability techniques, such as saliency maps or feature importance scores. Furthermore, the authors should provide a more detailed discussion of the potential applications of EPInformer in biological research and medicine. This should include concrete examples of how the model can be used to advance our understanding of gene regulation and disease. For example, the authors could discuss how EPInformer can be used to identify novel regulatory elements, predict the impact of genetic variants on gene expression, or develop new therapeutic strategies. The authors should also consider exploring the use of EPInformer for other regulatory elements beyond promoters and enhancers, such as insulators or boundaries. This would broaden the scope of the model and increase its potential impact. Additionally, the authors should consider integrating additional data types, such as transcription factor binding data or RNA velocity data, to further improve the model's predictive power and biological relevance. The authors should also consider developing a user-friendly interface or software package for EPInformer, which would make it more accessible to researchers who may not have extensive experience with deep learning. This would facilitate the adoption of the model and increase its impact on the field. Finally, the authors should consider performing ablation studies to assess the contribution of each input feature to the model's performance. This would help to identify the most important features and potentially lead to a more efficient model. The authors should also consider performing a more rigorous statistical analysis of the results, including reporting confidence intervals and p-values. This would increase the reliability of the findings and provide a more solid foundation for the conclusions.

❓ Questions

Several key questions remain unanswered, and addressing these would significantly strengthen the paper. First, how does EPInformer perform on cell lines with different levels of epigenomic data availability? The current evaluation is limited to two cell lines with relatively comprehensive data. It is unclear how the model would perform on cell lines with less available data, or with data of lower quality. Second, how does the model's performance vary across different gene families or functional categories? Are there specific types of genes for which EPInformer performs particularly well or poorly? Understanding these differences would provide valuable insights into the model's strengths and limitations. Third, what is the computational cost of EPInformer compared to other models, and how does it scale with the size of the input data? While the authors claim that EPInformer is efficient, a more detailed analysis of training time, memory usage, and scalability is needed. Fourth, what are the potential biases in the training data, and how might these biases affect the model's performance? For example, is the model biased towards cell types with more available data, or towards genes with higher expression levels? Fifth, how can the attention weights be used to gain insights into the regulatory mechanisms underlying gene expression? Do the attention weights correlate with known regulatory relationships, and can they be used to predict the impact of perturbations on gene expression? Sixth, what are the most promising applications of EPInformer in biological research and medicine? The authors should provide concrete examples of how the model can be used to advance our understanding of gene regulation and disease. Seventh, how does EPInformer handle novel or previously uncharacterized enhancers? Does the model rely solely on known enhancer regions, or can it identify new regulatory elements? Eighth, how robust is EPInformer to noise or errors in the input data? How does the model's performance change when the input data is perturbed or contains missing values? Ninth, can the model be used to predict the effects of genetic variants on gene expression? If so, how accurate are these predictions, and how do they compare to other methods for variant effect prediction? Finally, what are the limitations of the current model, and what are the next steps for improving its performance and generalizability? The authors should discuss the potential challenges in applying EPInformer to different biological contexts and the future directions for research in this area.

📊 Scores

Soundness:3.0
Presentation:3.0
Contribution:2.5
Rating: 5.75

Keywords

Click the button to extract keywords

Insights

Click the button to extract insights
Version 1
Citation Tools

📝 Cite This Paper