Research ArticleRESEARCH METHODS

# Neyman-Pearson classification algorithms and NP receiver operating characteristics

See allHide authors and affiliations

Science Advances  02 Feb 2018:
Vol. 4, no. 2, eaao1659

### Tables

•  Algorithm 1. An NP umbrella algorithm 1: Input:  Training data: A mixed i.i.d. sample , where and are class 0 and 1 samples, respectively  α: Type I error upper bound, 0 ≤ α ≤ 1; (default α = 0.05)  δ: A small tolerance level, 0 < δ < 1; (default δ = 0.05)  M: Number of random splits on ; (default M = 1) 2: Function RankThreshold(n, α, δ) 3: For k in {1, ⋯, n} do ◃ For each rank threshold candidate k 4: ◃ Calculate the violation rate upper bound 5: k* ← min{k ∈ {1, ⋯, n} : v(k) ≤ δ} ◃ Pick the rank threshold 6: Return k* 7: Procedure NPClassifier() 8: ◃ Denote half of the size of as n 9: k* ← RankThreshold(n, α, δ) ◃ Find the rank threshold 10: For i in {1, ⋯, M} do ◃ Randomly split for M times 11: random split on ◃ Each time randomly split into two halves with equal sizes 12: ◃ Combine and 13: ◃ Write as a set of n data points 14: fi ← ClassificationAlgorithm ◃ Train a scoring function fi on 15: ◃ Apply the scoring function fi to to obtain a set of score threshold candidates 16: {ti,(1), ⋯, ti,(n)} ← sort ◃ Sort elements of in an increasing order 17: ◃ Find the score threshold corresponding to the rank threshold k* 18: ◃ Construct an NP classifier based on the scoring function fi and the threshold 19: Output: an ensemble NP classifier ◃ By majority vote

### Supplementary Materials

Proof of Proposition 1

Conditional type II error bounds in NP-ROC bands

Empirical ROC curves versus NP-ROC bands in guiding users to choose classifiers to satisfy type I error control

Effects of majority voting on the type I and II errors of the ensemble classifier

table S1. Results of LR in Simulation S2.

table S2. Results of SVMs in Simulation S2.

table S3. Results of RFs in Simulation S2.

table S4. Results of NB in Simulation S2.

table S5. Results of LDA in Simulation S2.

table S6. Results of AdaBoost in Simulation S2.

table S7. Description of variables used in real data application 1.

table S8. The performance of the NP umbrella algorithm in real data application 2.

table S9. Input information of the nproc package (version 2.0.9).

R codes and data sets

• ## Supplementary Materials

This PDF file includes:

• Proof of Proposition 1
• Conditional type II error bounds in NP-ROC bands
• Empirical ROC curves versus NP-ROC bands in guiding users to choose classifiers to satisfy type I error control
• Effects of majority voting on the type I and II errors of the ensemble classifier
• table S1. Results of LR in Simulation S2.
• table S2. Results of SVMs in Simulation S2.
• table S3. Results of RFs in Simulation S2.
• table S4. Results of NB in Simulation S2.
• table S5. Results of LDA in Simulation S2.
• table S6. Results of AdaBoost in Simulation S2.
• table S7. Description of variables used in real data application 1.
• table S8. The performance of the NP umbrella algorithm in real data application 2.
• table S9. Input information of the nproc package (version 2.0.9).