Improving Peer Review Outcome Prediction in Scientific Manuscripts Using We-Sentences

M P Gokul Krishna; E Pavithra; K Bhaskar

doi:10.64751/q89j2f44

Authors

M P Gokul Krishna Author
E Pavithra Author
K Bhaskar Author

DOI:

https://doi.org/10.64751/q89j2f44

Keywords:

Peer review, automated prediction, transformer embeddings, TF-IDF features, ELECTRA model, wesentence weighting, explainability tools, Flask interface.

Abstract

Peer review is crucial for upholding quality in scientific publishing; nevertheless, rising submission rates and subjective discrepancies complicate uniform assessment. Manual decision-making is frequently sluggish and challenging to scale, prompting the use of automated prediction through linguistic and structural indicators from academic articles. The PeerRead and ASAP-Review datasets were standardized into consistent fields—title, abstract, body text, and conclusion—subsequently undergoing extensive preprocessing that included the elimination of LaTeX syntax, citations, URLs, and special characters, as well as lowercasing, punctuation normalization, and sentence segmentation. A specialized approach extracted all we-sentences, facilitating full-text, we-only, and weighted configurations. The feature representation utilized TF-IDF with modified sentence weighting and transformer-based embeddings produced by BART, DGCBERT, SciBERT, and Longformer, according to suitable tokenization limitations. The models comprised LinearSVC variations utilizing TF and TF-IDF, SVM with EXPEI weighting, finetuned transformer designs, and a high-performing ELECTRA-based method. The assessment utilizing accuracy, precision, recall, and F1-score demonstrated that ELECTRA achieved 96.3% accuracy on PeerRead and 89.9% on ASAP, surpassing all baseline models. Explainability was integrated using LIME and SHAP to emphasize significant aspects, while a Flask-based interface facilitated comprehensive prediction, preprocessing, embedding, and display of topic modeling. The results indicate a quantifiable improvement in forecasting peer review results by focused language emphasis and sophisticated embedding methods