# README

This repository contains code and data for "The Impact of Peer Review on the Contribution Potential of Scientific Papers."

# Library requirements and versions
## Python
- python: 3.6.10
- numpy:  1.15.4
- pandas:  1.0.4
- statsmodels:  0.11.1
- nltk:  3.5
- textblob:  0.15.3
- tqdm:  4.46.1

## R
- R: 3.5.3
- survival: 2.44.1.1
- stargazer : 5.2.2

# Code for preprocessing and analysis

## Regression analysis

For those who only want to run the analysis codes, we already put the preprocessed data at `data_for_analysis`:

### For Table 3
   run `analysis_code/sentiment_citaion_mixed_effect.ipynb` 
   
### For Table 2, A2-1, A2-2
   run `analysis_code/revise_decision_logit.R`
  
## LDA (Figure 2 and A4-1)
  -  download `lda_text_diff.txt` from [the google drive](https://drive.google.com/file/d/1Aazyqpn1jbq77U9QzM-TE9HaCC2mzFtk/view?usp=sharing), unzip and put it at `LDA/`
  - run `diff_topic_analysis.ipynb`


# Preprocessing

To re-run all analyses, including preprocessing:

From the [google drive](https://drive.google.com/file/d/1Aazyqpn1jbq77U9QzM-TE9HaCC2mzFtk/view?usp=sharing), download .zip file and unzip it.

Then,
- Put `peerj_review_data.json` at `data/raw_data`
- Put `diff_content.json`  at `LDA/`

Run these codes ( in `pre.sh`)
- `prepro/prepro_review_data.py`
- `prepro/individual_sentiment_calculation.py`
- `prepro/reviwers_authors_sentiment.py`
- `prepro/individual_sentiment_calculation_vader.py`
- `prepro/reviwers_authors_sentiment_vader.py`
- `prepro/round_count.py`
- `prepro/revision_decision_logit_data.py`
-  `LDA/lda_prepro.py` 

After finishing preprocessing, run analysis codes.

## Regression analysis
### For Table3 and 4
   in `analysis_code/sentiment_citaion_mixed_effect.ipynb` 
   
### For Table 2, A2-1 and A2-2
   run `analysis_code/revise_decision_logit.R`
   
### Appendix

### Vader sentiment calculation and Altmetric score
   in `analysis_code/sentiment_citaion_mixed_effect.ipynb` 

## LDA (Figure 2 and A4-1)
 - in `diff_topic_analysis.ipynb`