In this practical, we are going to learn about feature selection and dimension reduction methods for text data.
Today we will use the following libraries. Take care to have them installed!
from sklearn.datasets import load_files
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import RegexpTokenizer
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.feature_selection import mutual_info_classif
from sklearn.svm import LinearSVC
from sklearn.feature_selection import SelectFromModel
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn import metrics
from sklearn.naive_bayes import MultinomialNB
from sklearn.decomposition import PCA, TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
import matplotlib.pyplot as plt
1. Here we are going to use a news article data set, originating from BBC news website. This dataset was provided for benchmarking machine learning algorithms. The BBC data set consists of 2,225 documents and 5 categories: business, entertainment, politics, sport, and tech. Upload the data.zip
file and extract it using the code below.
# for reproducibility
random_state = 321
!unzip /content/data.zip # enter [N]
Archive: /content/data.zip creating: bbc/ creating: bbc/business/ inflating: bbc/business/001.txt inflating: bbc/business/002.txt inflating: bbc/business/003.txt inflating: bbc/business/004.txt inflating: bbc/business/005.txt inflating: bbc/business/006.txt inflating: bbc/business/007.txt inflating: bbc/business/008.txt inflating: bbc/business/009.txt inflating: bbc/business/010.txt inflating: bbc/business/011.txt inflating: bbc/business/012.txt inflating: bbc/business/013.txt inflating: bbc/business/014.txt inflating: bbc/business/015.txt inflating: bbc/business/016.txt inflating: bbc/business/017.txt inflating: bbc/business/018.txt inflating: bbc/business/019.txt inflating: bbc/business/020.txt inflating: bbc/business/021.txt inflating: bbc/business/022.txt inflating: bbc/business/023.txt inflating: bbc/business/024.txt inflating: bbc/business/025.txt inflating: bbc/business/026.txt inflating: bbc/business/027.txt inflating: bbc/business/028.txt inflating: bbc/business/029.txt inflating: bbc/business/030.txt inflating: bbc/business/031.txt inflating: bbc/business/032.txt inflating: bbc/business/033.txt inflating: bbc/business/034.txt inflating: bbc/business/035.txt inflating: bbc/business/036.txt inflating: bbc/business/037.txt inflating: bbc/business/038.txt inflating: bbc/business/039.txt inflating: bbc/business/040.txt inflating: bbc/business/041.txt inflating: bbc/business/042.txt inflating: bbc/business/043.txt inflating: bbc/business/044.txt inflating: bbc/business/045.txt inflating: bbc/business/046.txt inflating: bbc/business/047.txt inflating: bbc/business/048.txt inflating: bbc/business/049.txt inflating: bbc/business/050.txt inflating: bbc/business/051.txt inflating: bbc/business/052.txt inflating: bbc/business/053.txt inflating: bbc/business/054.txt inflating: bbc/business/055.txt inflating: bbc/business/056.txt inflating: bbc/business/057.txt inflating: bbc/business/058.txt inflating: bbc/business/059.txt inflating: bbc/business/060.txt inflating: bbc/business/061.txt inflating: bbc/business/062.txt inflating: bbc/business/063.txt inflating: bbc/business/064.txt inflating: bbc/business/065.txt inflating: bbc/business/066.txt inflating: bbc/business/067.txt inflating: bbc/business/068.txt inflating: bbc/business/069.txt inflating: bbc/business/070.txt inflating: bbc/business/071.txt inflating: bbc/business/072.txt inflating: bbc/business/073.txt inflating: bbc/business/074.txt inflating: bbc/business/075.txt inflating: bbc/business/076.txt inflating: bbc/business/077.txt inflating: bbc/business/078.txt inflating: bbc/business/079.txt inflating: bbc/business/080.txt inflating: bbc/business/081.txt inflating: bbc/business/082.txt inflating: bbc/business/083.txt inflating: bbc/business/084.txt inflating: bbc/business/085.txt inflating: bbc/business/086.txt inflating: bbc/business/087.txt inflating: bbc/business/088.txt inflating: bbc/business/089.txt inflating: bbc/business/090.txt inflating: bbc/business/091.txt inflating: bbc/business/092.txt inflating: bbc/business/093.txt inflating: bbc/business/094.txt inflating: bbc/business/095.txt inflating: bbc/business/096.txt inflating: bbc/business/097.txt inflating: bbc/business/098.txt inflating: bbc/business/099.txt inflating: bbc/business/100.txt inflating: bbc/business/101.txt inflating: bbc/business/102.txt inflating: bbc/business/103.txt inflating: bbc/business/104.txt inflating: bbc/business/105.txt inflating: bbc/business/106.txt inflating: bbc/business/107.txt inflating: bbc/business/108.txt inflating: bbc/business/109.txt inflating: bbc/business/110.txt inflating: bbc/business/111.txt inflating: bbc/business/112.txt inflating: bbc/business/113.txt inflating: bbc/business/114.txt inflating: bbc/business/115.txt inflating: bbc/business/116.txt inflating: bbc/business/117.txt inflating: bbc/business/118.txt inflating: bbc/business/119.txt inflating: bbc/business/120.txt inflating: bbc/business/121.txt inflating: bbc/business/122.txt inflating: bbc/business/123.txt inflating: bbc/business/124.txt inflating: bbc/business/125.txt inflating: bbc/business/126.txt inflating: bbc/business/127.txt inflating: bbc/business/128.txt inflating: bbc/business/129.txt inflating: bbc/business/130.txt inflating: bbc/business/131.txt inflating: bbc/business/132.txt inflating: bbc/business/133.txt inflating: bbc/business/134.txt inflating: bbc/business/135.txt inflating: bbc/business/136.txt inflating: bbc/business/137.txt inflating: bbc/business/138.txt inflating: bbc/business/139.txt inflating: bbc/business/140.txt inflating: bbc/business/141.txt inflating: bbc/business/142.txt inflating: bbc/business/143.txt inflating: bbc/business/144.txt inflating: bbc/business/145.txt inflating: bbc/business/146.txt inflating: bbc/business/147.txt inflating: bbc/business/148.txt inflating: bbc/business/149.txt inflating: bbc/business/150.txt inflating: bbc/business/151.txt inflating: bbc/business/152.txt inflating: bbc/business/153.txt inflating: bbc/business/154.txt inflating: bbc/business/155.txt inflating: bbc/business/156.txt inflating: bbc/business/157.txt inflating: bbc/business/158.txt inflating: bbc/business/159.txt inflating: bbc/business/160.txt inflating: bbc/business/161.txt inflating: bbc/business/162.txt inflating: bbc/business/163.txt inflating: bbc/business/164.txt inflating: bbc/business/165.txt inflating: bbc/business/166.txt inflating: bbc/business/167.txt inflating: bbc/business/168.txt inflating: bbc/business/169.txt inflating: bbc/business/170.txt inflating: bbc/business/171.txt inflating: bbc/business/172.txt inflating: bbc/business/173.txt inflating: bbc/business/174.txt inflating: bbc/business/175.txt inflating: bbc/business/176.txt inflating: bbc/business/177.txt inflating: bbc/business/178.txt inflating: bbc/business/179.txt inflating: bbc/business/180.txt inflating: bbc/business/181.txt inflating: bbc/business/182.txt inflating: bbc/business/183.txt inflating: bbc/business/184.txt inflating: bbc/business/185.txt inflating: bbc/business/186.txt inflating: bbc/business/187.txt inflating: bbc/business/188.txt inflating: bbc/business/189.txt inflating: bbc/business/190.txt inflating: bbc/business/191.txt inflating: bbc/business/192.txt inflating: bbc/business/193.txt inflating: bbc/business/194.txt inflating: bbc/business/195.txt inflating: bbc/business/196.txt inflating: bbc/business/197.txt inflating: bbc/business/198.txt inflating: bbc/business/199.txt inflating: bbc/business/200.txt inflating: bbc/business/201.txt inflating: bbc/business/202.txt inflating: bbc/business/203.txt inflating: bbc/business/204.txt inflating: bbc/business/205.txt inflating: bbc/business/206.txt inflating: bbc/business/207.txt inflating: bbc/business/208.txt inflating: bbc/business/209.txt inflating: bbc/business/210.txt inflating: bbc/business/211.txt inflating: bbc/business/212.txt inflating: bbc/business/213.txt inflating: bbc/business/214.txt inflating: bbc/business/215.txt inflating: bbc/business/216.txt inflating: bbc/business/217.txt inflating: bbc/business/218.txt inflating: bbc/business/219.txt inflating: bbc/business/220.txt inflating: bbc/business/221.txt inflating: bbc/business/222.txt inflating: bbc/business/223.txt inflating: bbc/business/224.txt inflating: bbc/business/225.txt inflating: bbc/business/226.txt inflating: bbc/business/227.txt inflating: bbc/business/228.txt inflating: bbc/business/229.txt inflating: bbc/business/230.txt inflating: bbc/business/231.txt inflating: bbc/business/232.txt inflating: bbc/business/233.txt inflating: bbc/business/234.txt inflating: bbc/business/235.txt inflating: bbc/business/236.txt inflating: bbc/business/237.txt inflating: bbc/business/238.txt inflating: bbc/business/239.txt inflating: bbc/business/240.txt inflating: bbc/business/241.txt inflating: bbc/business/242.txt inflating: bbc/business/243.txt inflating: bbc/business/244.txt inflating: bbc/business/245.txt inflating: bbc/business/246.txt inflating: bbc/business/247.txt inflating: bbc/business/248.txt inflating: bbc/business/249.txt inflating: bbc/business/250.txt inflating: bbc/business/251.txt inflating: bbc/business/252.txt inflating: bbc/business/253.txt inflating: bbc/business/254.txt inflating: bbc/business/255.txt inflating: bbc/business/256.txt inflating: bbc/business/257.txt inflating: bbc/business/258.txt inflating: bbc/business/259.txt inflating: bbc/business/260.txt inflating: bbc/business/261.txt inflating: bbc/business/262.txt inflating: bbc/business/263.txt inflating: bbc/business/264.txt inflating: bbc/business/265.txt inflating: bbc/business/266.txt inflating: bbc/business/267.txt inflating: bbc/business/268.txt inflating: bbc/business/269.txt inflating: bbc/business/270.txt inflating: bbc/business/271.txt inflating: bbc/business/272.txt inflating: bbc/business/273.txt inflating: bbc/business/274.txt inflating: bbc/business/275.txt inflating: bbc/business/276.txt inflating: bbc/business/277.txt inflating: bbc/business/278.txt inflating: bbc/business/279.txt inflating: bbc/business/280.txt inflating: bbc/business/281.txt inflating: bbc/business/282.txt inflating: bbc/business/283.txt inflating: bbc/business/284.txt inflating: bbc/business/285.txt inflating: bbc/business/286.txt inflating: bbc/business/287.txt inflating: bbc/business/288.txt inflating: bbc/business/289.txt inflating: bbc/business/290.txt inflating: bbc/business/291.txt inflating: bbc/business/292.txt inflating: bbc/business/293.txt inflating: bbc/business/294.txt inflating: bbc/business/295.txt inflating: bbc/business/296.txt inflating: bbc/business/297.txt inflating: bbc/business/298.txt inflating: bbc/business/299.txt inflating: bbc/business/300.txt inflating: bbc/business/301.txt inflating: bbc/business/302.txt inflating: bbc/business/303.txt inflating: bbc/business/304.txt inflating: bbc/business/305.txt inflating: bbc/business/306.txt inflating: bbc/business/307.txt inflating: bbc/business/308.txt inflating: bbc/business/309.txt inflating: bbc/business/310.txt inflating: bbc/business/311.txt inflating: bbc/business/312.txt inflating: bbc/business/313.txt inflating: bbc/business/314.txt inflating: bbc/business/315.txt inflating: bbc/business/316.txt inflating: bbc/business/317.txt inflating: bbc/business/318.txt inflating: bbc/business/319.txt inflating: bbc/business/320.txt inflating: bbc/business/321.txt inflating: bbc/business/322.txt inflating: bbc/business/323.txt inflating: bbc/business/324.txt inflating: bbc/business/325.txt inflating: bbc/business/326.txt inflating: bbc/business/327.txt inflating: bbc/business/328.txt inflating: bbc/business/329.txt inflating: bbc/business/330.txt inflating: bbc/business/331.txt inflating: bbc/business/332.txt inflating: bbc/business/333.txt inflating: bbc/business/334.txt inflating: bbc/business/335.txt inflating: bbc/business/336.txt inflating: bbc/business/337.txt inflating: bbc/business/338.txt inflating: bbc/business/339.txt inflating: bbc/business/340.txt inflating: bbc/business/341.txt inflating: bbc/business/342.txt inflating: bbc/business/343.txt inflating: bbc/business/344.txt inflating: bbc/business/345.txt inflating: bbc/business/346.txt inflating: bbc/business/347.txt inflating: bbc/business/348.txt inflating: bbc/business/349.txt inflating: bbc/business/350.txt inflating: bbc/business/351.txt inflating: bbc/business/352.txt inflating: bbc/business/353.txt inflating: bbc/business/354.txt inflating: bbc/business/355.txt inflating: bbc/business/356.txt inflating: bbc/business/357.txt inflating: bbc/business/358.txt inflating: bbc/business/359.txt inflating: bbc/business/360.txt inflating: bbc/business/361.txt inflating: bbc/business/362.txt inflating: bbc/business/363.txt inflating: bbc/business/364.txt inflating: bbc/business/365.txt inflating: bbc/business/366.txt inflating: bbc/business/367.txt inflating: bbc/business/368.txt inflating: bbc/business/369.txt inflating: bbc/business/370.txt inflating: bbc/business/371.txt inflating: bbc/business/372.txt inflating: bbc/business/373.txt inflating: bbc/business/374.txt inflating: bbc/business/375.txt inflating: bbc/business/376.txt inflating: bbc/business/377.txt inflating: bbc/business/378.txt inflating: bbc/business/379.txt inflating: bbc/business/380.txt inflating: bbc/business/381.txt inflating: bbc/business/382.txt inflating: bbc/business/383.txt inflating: bbc/business/384.txt inflating: bbc/business/385.txt inflating: bbc/business/386.txt inflating: bbc/business/387.txt inflating: bbc/business/388.txt inflating: bbc/business/389.txt inflating: bbc/business/390.txt inflating: bbc/business/391.txt inflating: bbc/business/392.txt inflating: bbc/business/393.txt inflating: bbc/business/394.txt inflating: bbc/business/395.txt inflating: bbc/business/396.txt inflating: bbc/business/397.txt inflating: bbc/business/398.txt inflating: bbc/business/399.txt inflating: bbc/business/400.txt inflating: bbc/business/401.txt inflating: bbc/business/402.txt inflating: bbc/business/403.txt inflating: bbc/business/404.txt inflating: bbc/business/405.txt inflating: bbc/business/406.txt inflating: bbc/business/407.txt inflating: bbc/business/408.txt inflating: bbc/business/409.txt inflating: bbc/business/410.txt inflating: bbc/business/411.txt inflating: bbc/business/412.txt inflating: bbc/business/413.txt inflating: bbc/business/414.txt inflating: bbc/business/415.txt inflating: bbc/business/416.txt inflating: bbc/business/417.txt inflating: bbc/business/418.txt inflating: bbc/business/419.txt inflating: bbc/business/420.txt inflating: bbc/business/421.txt inflating: bbc/business/422.txt inflating: bbc/business/423.txt inflating: bbc/business/424.txt inflating: bbc/business/425.txt inflating: bbc/business/426.txt inflating: bbc/business/427.txt inflating: bbc/business/428.txt inflating: bbc/business/429.txt inflating: bbc/business/430.txt inflating: bbc/business/431.txt inflating: bbc/business/432.txt inflating: bbc/business/433.txt inflating: bbc/business/434.txt inflating: bbc/business/435.txt inflating: bbc/business/436.txt inflating: bbc/business/437.txt inflating: bbc/business/438.txt inflating: bbc/business/439.txt inflating: bbc/business/440.txt inflating: bbc/business/441.txt inflating: bbc/business/442.txt inflating: bbc/business/443.txt inflating: bbc/business/444.txt inflating: bbc/business/445.txt inflating: bbc/business/446.txt inflating: bbc/business/447.txt inflating: bbc/business/448.txt inflating: bbc/business/449.txt inflating: bbc/business/450.txt inflating: bbc/business/451.txt inflating: bbc/business/452.txt inflating: bbc/business/453.txt inflating: bbc/business/454.txt inflating: bbc/business/455.txt inflating: bbc/business/456.txt inflating: bbc/business/457.txt inflating: bbc/business/458.txt inflating: bbc/business/459.txt inflating: bbc/business/460.txt inflating: bbc/business/461.txt inflating: bbc/business/462.txt inflating: bbc/business/463.txt inflating: bbc/business/464.txt inflating: bbc/business/465.txt inflating: bbc/business/466.txt inflating: bbc/business/467.txt inflating: bbc/business/468.txt inflating: bbc/business/469.txt inflating: bbc/business/470.txt inflating: bbc/business/471.txt inflating: bbc/business/472.txt inflating: bbc/business/473.txt inflating: bbc/business/474.txt inflating: bbc/business/475.txt inflating: bbc/business/476.txt inflating: bbc/business/477.txt inflating: bbc/business/478.txt inflating: bbc/business/479.txt inflating: bbc/business/480.txt inflating: bbc/business/481.txt inflating: bbc/business/482.txt inflating: bbc/business/483.txt inflating: bbc/business/484.txt inflating: bbc/business/485.txt inflating: bbc/business/486.txt inflating: bbc/business/487.txt inflating: bbc/business/488.txt inflating: bbc/business/489.txt inflating: bbc/business/490.txt inflating: bbc/business/491.txt inflating: bbc/business/492.txt inflating: bbc/business/493.txt inflating: bbc/business/494.txt inflating: bbc/business/495.txt inflating: bbc/business/496.txt inflating: bbc/business/497.txt inflating: bbc/business/498.txt inflating: bbc/business/499.txt inflating: bbc/business/500.txt inflating: bbc/business/501.txt inflating: bbc/business/502.txt inflating: bbc/business/503.txt inflating: bbc/business/504.txt inflating: bbc/business/505.txt inflating: bbc/business/506.txt inflating: bbc/business/507.txt inflating: bbc/business/508.txt inflating: bbc/business/509.txt inflating: bbc/business/510.txt creating: bbc/entertainment/ inflating: bbc/entertainment/001.txt inflating: bbc/entertainment/002.txt inflating: bbc/entertainment/003.txt inflating: bbc/entertainment/004.txt inflating: bbc/entertainment/005.txt inflating: bbc/entertainment/006.txt inflating: bbc/entertainment/007.txt inflating: bbc/entertainment/008.txt inflating: bbc/entertainment/009.txt inflating: bbc/entertainment/010.txt inflating: bbc/entertainment/011.txt inflating: bbc/entertainment/012.txt inflating: bbc/entertainment/013.txt inflating: bbc/entertainment/014.txt inflating: bbc/entertainment/015.txt inflating: bbc/entertainment/016.txt inflating: bbc/entertainment/017.txt inflating: bbc/entertainment/018.txt inflating: bbc/entertainment/019.txt inflating: bbc/entertainment/020.txt inflating: bbc/entertainment/021.txt inflating: bbc/entertainment/022.txt inflating: bbc/entertainment/023.txt inflating: bbc/entertainment/024.txt inflating: bbc/entertainment/025.txt inflating: bbc/entertainment/026.txt inflating: bbc/entertainment/027.txt inflating: bbc/entertainment/028.txt inflating: bbc/entertainment/029.txt inflating: bbc/entertainment/030.txt inflating: bbc/entertainment/031.txt inflating: bbc/entertainment/032.txt inflating: bbc/entertainment/033.txt inflating: bbc/entertainment/034.txt inflating: bbc/entertainment/035.txt inflating: bbc/entertainment/036.txt inflating: bbc/entertainment/037.txt inflating: bbc/entertainment/038.txt inflating: bbc/entertainment/039.txt inflating: bbc/entertainment/040.txt inflating: bbc/entertainment/041.txt inflating: bbc/entertainment/042.txt inflating: bbc/entertainment/043.txt inflating: bbc/entertainment/044.txt inflating: bbc/entertainment/045.txt inflating: bbc/entertainment/046.txt inflating: bbc/entertainment/047.txt inflating: bbc/entertainment/048.txt inflating: bbc/entertainment/049.txt inflating: bbc/entertainment/050.txt inflating: bbc/entertainment/051.txt inflating: bbc/entertainment/052.txt inflating: bbc/entertainment/053.txt inflating: bbc/entertainment/054.txt inflating: bbc/entertainment/055.txt inflating: bbc/entertainment/056.txt inflating: bbc/entertainment/057.txt inflating: bbc/entertainment/058.txt inflating: bbc/entertainment/059.txt inflating: bbc/entertainment/060.txt inflating: bbc/entertainment/061.txt inflating: bbc/entertainment/062.txt inflating: bbc/entertainment/063.txt inflating: bbc/entertainment/064.txt inflating: bbc/entertainment/065.txt inflating: bbc/entertainment/066.txt inflating: bbc/entertainment/067.txt inflating: bbc/entertainment/068.txt inflating: bbc/entertainment/069.txt inflating: bbc/entertainment/070.txt inflating: bbc/entertainment/071.txt inflating: bbc/entertainment/072.txt inflating: bbc/entertainment/073.txt inflating: bbc/entertainment/074.txt inflating: bbc/entertainment/075.txt inflating: bbc/entertainment/076.txt inflating: bbc/entertainment/077.txt inflating: bbc/entertainment/078.txt inflating: bbc/entertainment/079.txt inflating: bbc/entertainment/080.txt inflating: bbc/entertainment/081.txt inflating: bbc/entertainment/082.txt inflating: bbc/entertainment/083.txt inflating: bbc/entertainment/084.txt inflating: bbc/entertainment/085.txt inflating: bbc/entertainment/086.txt inflating: bbc/entertainment/087.txt inflating: bbc/entertainment/088.txt inflating: bbc/entertainment/089.txt inflating: bbc/entertainment/090.txt inflating: bbc/entertainment/091.txt inflating: bbc/entertainment/092.txt inflating: bbc/entertainment/093.txt inflating: bbc/entertainment/094.txt inflating: bbc/entertainment/095.txt inflating: bbc/entertainment/096.txt inflating: bbc/entertainment/097.txt inflating: bbc/entertainment/098.txt inflating: bbc/entertainment/099.txt inflating: bbc/entertainment/100.txt inflating: bbc/entertainment/101.txt inflating: bbc/entertainment/102.txt inflating: bbc/entertainment/103.txt inflating: bbc/entertainment/104.txt inflating: bbc/entertainment/105.txt inflating: bbc/entertainment/106.txt inflating: bbc/entertainment/107.txt inflating: bbc/entertainment/108.txt inflating: bbc/entertainment/109.txt inflating: bbc/entertainment/110.txt inflating: bbc/entertainment/111.txt inflating: bbc/entertainment/112.txt inflating: bbc/entertainment/113.txt inflating: bbc/entertainment/114.txt inflating: bbc/entertainment/115.txt inflating: bbc/entertainment/116.txt inflating: bbc/entertainment/117.txt inflating: bbc/entertainment/118.txt inflating: bbc/entertainment/119.txt inflating: bbc/entertainment/120.txt inflating: bbc/entertainment/121.txt inflating: bbc/entertainment/122.txt inflating: bbc/entertainment/123.txt inflating: bbc/entertainment/124.txt inflating: bbc/entertainment/125.txt inflating: bbc/entertainment/126.txt inflating: bbc/entertainment/127.txt inflating: bbc/entertainment/128.txt inflating: bbc/entertainment/129.txt inflating: bbc/entertainment/130.txt inflating: bbc/entertainment/131.txt inflating: bbc/entertainment/132.txt inflating: bbc/entertainment/133.txt inflating: bbc/entertainment/134.txt inflating: bbc/entertainment/135.txt inflating: bbc/entertainment/136.txt inflating: bbc/entertainment/137.txt inflating: bbc/entertainment/138.txt inflating: bbc/entertainment/139.txt inflating: bbc/entertainment/140.txt inflating: bbc/entertainment/141.txt inflating: bbc/entertainment/142.txt inflating: bbc/entertainment/143.txt inflating: bbc/entertainment/144.txt inflating: bbc/entertainment/145.txt inflating: bbc/entertainment/146.txt inflating: bbc/entertainment/147.txt inflating: bbc/entertainment/148.txt inflating: bbc/entertainment/149.txt inflating: bbc/entertainment/150.txt inflating: bbc/entertainment/151.txt inflating: bbc/entertainment/152.txt inflating: bbc/entertainment/153.txt inflating: bbc/entertainment/154.txt inflating: bbc/entertainment/155.txt inflating: bbc/entertainment/156.txt inflating: bbc/entertainment/157.txt inflating: bbc/entertainment/158.txt inflating: bbc/entertainment/159.txt inflating: bbc/entertainment/160.txt inflating: bbc/entertainment/161.txt inflating: bbc/entertainment/162.txt inflating: bbc/entertainment/163.txt inflating: bbc/entertainment/164.txt inflating: bbc/entertainment/165.txt inflating: bbc/entertainment/166.txt inflating: bbc/entertainment/167.txt inflating: bbc/entertainment/168.txt inflating: bbc/entertainment/169.txt inflating: bbc/entertainment/170.txt inflating: bbc/entertainment/171.txt inflating: bbc/entertainment/172.txt inflating: bbc/entertainment/173.txt inflating: bbc/entertainment/174.txt inflating: bbc/entertainment/175.txt inflating: bbc/entertainment/176.txt inflating: bbc/entertainment/177.txt inflating: bbc/entertainment/178.txt inflating: bbc/entertainment/179.txt inflating: bbc/entertainment/180.txt inflating: bbc/entertainment/181.txt inflating: bbc/entertainment/182.txt inflating: bbc/entertainment/183.txt inflating: bbc/entertainment/184.txt inflating: bbc/entertainment/185.txt inflating: bbc/entertainment/186.txt inflating: bbc/entertainment/187.txt inflating: bbc/entertainment/188.txt inflating: bbc/entertainment/189.txt inflating: bbc/entertainment/190.txt inflating: bbc/entertainment/191.txt inflating: bbc/entertainment/192.txt inflating: bbc/entertainment/193.txt inflating: bbc/entertainment/194.txt inflating: bbc/entertainment/195.txt inflating: bbc/entertainment/196.txt inflating: bbc/entertainment/197.txt inflating: bbc/entertainment/198.txt inflating: bbc/entertainment/199.txt inflating: bbc/entertainment/200.txt inflating: bbc/entertainment/201.txt inflating: bbc/entertainment/202.txt inflating: bbc/entertainment/203.txt inflating: bbc/entertainment/204.txt inflating: bbc/entertainment/205.txt inflating: bbc/entertainment/206.txt inflating: bbc/entertainment/207.txt inflating: bbc/entertainment/208.txt inflating: bbc/entertainment/209.txt inflating: bbc/entertainment/210.txt inflating: bbc/entertainment/211.txt inflating: bbc/entertainment/212.txt inflating: bbc/entertainment/213.txt inflating: bbc/entertainment/214.txt inflating: bbc/entertainment/215.txt inflating: bbc/entertainment/216.txt inflating: bbc/entertainment/217.txt inflating: bbc/entertainment/218.txt inflating: bbc/entertainment/219.txt inflating: bbc/entertainment/220.txt inflating: bbc/entertainment/221.txt inflating: bbc/entertainment/222.txt inflating: bbc/entertainment/223.txt inflating: bbc/entertainment/224.txt inflating: bbc/entertainment/225.txt inflating: bbc/entertainment/226.txt inflating: bbc/entertainment/227.txt inflating: bbc/entertainment/228.txt inflating: bbc/entertainment/229.txt inflating: bbc/entertainment/230.txt inflating: bbc/entertainment/231.txt inflating: bbc/entertainment/232.txt inflating: bbc/entertainment/233.txt inflating: bbc/entertainment/234.txt inflating: bbc/entertainment/235.txt inflating: bbc/entertainment/236.txt inflating: bbc/entertainment/237.txt inflating: bbc/entertainment/238.txt inflating: bbc/entertainment/239.txt inflating: bbc/entertainment/240.txt inflating: bbc/entertainment/241.txt inflating: bbc/entertainment/242.txt inflating: bbc/entertainment/243.txt inflating: bbc/entertainment/244.txt inflating: bbc/entertainment/245.txt inflating: bbc/entertainment/246.txt inflating: bbc/entertainment/247.txt inflating: bbc/entertainment/248.txt inflating: bbc/entertainment/249.txt inflating: bbc/entertainment/250.txt inflating: bbc/entertainment/251.txt inflating: bbc/entertainment/252.txt inflating: bbc/entertainment/253.txt inflating: bbc/entertainment/254.txt inflating: bbc/entertainment/255.txt inflating: bbc/entertainment/256.txt inflating: bbc/entertainment/257.txt inflating: bbc/entertainment/258.txt inflating: bbc/entertainment/259.txt inflating: bbc/entertainment/260.txt inflating: bbc/entertainment/261.txt inflating: bbc/entertainment/262.txt inflating: bbc/entertainment/263.txt inflating: bbc/entertainment/264.txt inflating: bbc/entertainment/265.txt inflating: bbc/entertainment/266.txt inflating: bbc/entertainment/267.txt inflating: bbc/entertainment/268.txt inflating: bbc/entertainment/269.txt inflating: bbc/entertainment/270.txt inflating: bbc/entertainment/271.txt inflating: bbc/entertainment/272.txt inflating: bbc/entertainment/273.txt inflating: bbc/entertainment/274.txt inflating: bbc/entertainment/275.txt inflating: bbc/entertainment/276.txt inflating: bbc/entertainment/277.txt inflating: bbc/entertainment/278.txt inflating: bbc/entertainment/279.txt inflating: bbc/entertainment/280.txt inflating: bbc/entertainment/281.txt inflating: bbc/entertainment/282.txt inflating: bbc/entertainment/283.txt inflating: bbc/entertainment/284.txt inflating: bbc/entertainment/285.txt inflating: bbc/entertainment/286.txt inflating: bbc/entertainment/287.txt inflating: bbc/entertainment/288.txt inflating: bbc/entertainment/289.txt inflating: bbc/entertainment/290.txt inflating: bbc/entertainment/291.txt inflating: bbc/entertainment/292.txt inflating: bbc/entertainment/293.txt inflating: bbc/entertainment/294.txt inflating: bbc/entertainment/295.txt inflating: bbc/entertainment/296.txt inflating: bbc/entertainment/297.txt inflating: bbc/entertainment/298.txt inflating: bbc/entertainment/299.txt inflating: bbc/entertainment/300.txt inflating: bbc/entertainment/301.txt inflating: bbc/entertainment/302.txt inflating: bbc/entertainment/303.txt inflating: bbc/entertainment/304.txt inflating: bbc/entertainment/305.txt inflating: bbc/entertainment/306.txt inflating: bbc/entertainment/307.txt inflating: bbc/entertainment/308.txt inflating: bbc/entertainment/309.txt inflating: bbc/entertainment/310.txt inflating: bbc/entertainment/311.txt inflating: bbc/entertainment/312.txt inflating: bbc/entertainment/313.txt inflating: bbc/entertainment/314.txt inflating: bbc/entertainment/315.txt inflating: bbc/entertainment/316.txt inflating: bbc/entertainment/317.txt inflating: bbc/entertainment/318.txt inflating: bbc/entertainment/319.txt inflating: bbc/entertainment/320.txt inflating: bbc/entertainment/321.txt inflating: bbc/entertainment/322.txt inflating: bbc/entertainment/323.txt inflating: bbc/entertainment/324.txt inflating: bbc/entertainment/325.txt inflating: bbc/entertainment/326.txt inflating: bbc/entertainment/327.txt inflating: bbc/entertainment/328.txt inflating: bbc/entertainment/329.txt inflating: bbc/entertainment/330.txt inflating: bbc/entertainment/331.txt inflating: bbc/entertainment/332.txt inflating: bbc/entertainment/333.txt inflating: bbc/entertainment/334.txt inflating: bbc/entertainment/335.txt inflating: bbc/entertainment/336.txt inflating: bbc/entertainment/337.txt inflating: bbc/entertainment/338.txt inflating: bbc/entertainment/339.txt inflating: bbc/entertainment/340.txt inflating: bbc/entertainment/341.txt inflating: bbc/entertainment/342.txt inflating: bbc/entertainment/343.txt inflating: bbc/entertainment/344.txt inflating: bbc/entertainment/345.txt inflating: bbc/entertainment/346.txt inflating: bbc/entertainment/347.txt inflating: bbc/entertainment/348.txt inflating: bbc/entertainment/349.txt inflating: bbc/entertainment/350.txt inflating: bbc/entertainment/351.txt inflating: bbc/entertainment/352.txt inflating: bbc/entertainment/353.txt inflating: bbc/entertainment/354.txt inflating: bbc/entertainment/355.txt inflating: bbc/entertainment/356.txt inflating: bbc/entertainment/357.txt inflating: bbc/entertainment/358.txt inflating: bbc/entertainment/359.txt inflating: bbc/entertainment/360.txt inflating: bbc/entertainment/361.txt inflating: bbc/entertainment/362.txt inflating: bbc/entertainment/363.txt inflating: bbc/entertainment/364.txt inflating: bbc/entertainment/365.txt inflating: bbc/entertainment/366.txt inflating: bbc/entertainment/367.txt inflating: bbc/entertainment/368.txt inflating: bbc/entertainment/369.txt inflating: bbc/entertainment/370.txt inflating: bbc/entertainment/371.txt inflating: bbc/entertainment/372.txt inflating: bbc/entertainment/373.txt inflating: bbc/entertainment/374.txt inflating: bbc/entertainment/375.txt inflating: bbc/entertainment/376.txt inflating: bbc/entertainment/377.txt inflating: bbc/entertainment/378.txt inflating: bbc/entertainment/379.txt inflating: bbc/entertainment/380.txt inflating: bbc/entertainment/381.txt inflating: bbc/entertainment/382.txt inflating: bbc/entertainment/383.txt inflating: bbc/entertainment/384.txt inflating: bbc/entertainment/385.txt inflating: bbc/entertainment/386.txt creating: bbc/politics/ inflating: bbc/politics/001.txt inflating: bbc/politics/002.txt inflating: bbc/politics/003.txt inflating: bbc/politics/004.txt inflating: bbc/politics/005.txt inflating: bbc/politics/006.txt inflating: bbc/politics/007.txt inflating: bbc/politics/008.txt inflating: bbc/politics/009.txt inflating: bbc/politics/010.txt inflating: bbc/politics/011.txt inflating: bbc/politics/012.txt inflating: bbc/politics/013.txt inflating: bbc/politics/014.txt inflating: bbc/politics/015.txt inflating: bbc/politics/016.txt inflating: bbc/politics/017.txt inflating: bbc/politics/018.txt inflating: bbc/politics/019.txt inflating: bbc/politics/020.txt inflating: bbc/politics/021.txt inflating: bbc/politics/022.txt inflating: bbc/politics/023.txt inflating: bbc/politics/024.txt inflating: bbc/politics/025.txt inflating: bbc/politics/026.txt inflating: bbc/politics/027.txt inflating: bbc/politics/028.txt inflating: bbc/politics/029.txt inflating: bbc/politics/030.txt inflating: bbc/politics/031.txt inflating: bbc/politics/032.txt inflating: bbc/politics/033.txt inflating: bbc/politics/034.txt inflating: bbc/politics/035.txt inflating: bbc/politics/036.txt inflating: bbc/politics/037.txt inflating: bbc/politics/038.txt inflating: bbc/politics/039.txt inflating: bbc/politics/040.txt inflating: bbc/politics/041.txt inflating: bbc/politics/042.txt inflating: bbc/politics/043.txt inflating: bbc/politics/044.txt inflating: bbc/politics/045.txt inflating: bbc/politics/046.txt inflating: bbc/politics/047.txt inflating: bbc/politics/048.txt inflating: bbc/politics/049.txt inflating: bbc/politics/050.txt inflating: bbc/politics/051.txt inflating: bbc/politics/052.txt inflating: bbc/politics/053.txt inflating: bbc/politics/054.txt inflating: bbc/politics/055.txt inflating: bbc/politics/056.txt inflating: bbc/politics/057.txt inflating: bbc/politics/058.txt inflating: bbc/politics/059.txt inflating: bbc/politics/060.txt inflating: bbc/politics/061.txt inflating: bbc/politics/062.txt inflating: bbc/politics/063.txt inflating: bbc/politics/064.txt inflating: bbc/politics/065.txt inflating: bbc/politics/066.txt inflating: bbc/politics/067.txt inflating: bbc/politics/068.txt inflating: bbc/politics/069.txt inflating: bbc/politics/070.txt inflating: bbc/politics/071.txt inflating: bbc/politics/072.txt inflating: bbc/politics/073.txt inflating: bbc/politics/074.txt inflating: bbc/politics/075.txt inflating: bbc/politics/076.txt inflating: bbc/politics/077.txt inflating: bbc/politics/078.txt inflating: bbc/politics/079.txt inflating: bbc/politics/080.txt inflating: bbc/politics/081.txt inflating: bbc/politics/082.txt inflating: bbc/politics/083.txt inflating: bbc/politics/084.txt inflating: bbc/politics/085.txt inflating: bbc/politics/086.txt inflating: bbc/politics/087.txt inflating: bbc/politics/088.txt inflating: bbc/politics/089.txt inflating: bbc/politics/090.txt inflating: bbc/politics/091.txt inflating: bbc/politics/092.txt inflating: bbc/politics/093.txt inflating: bbc/politics/094.txt inflating: bbc/politics/095.txt inflating: bbc/politics/096.txt inflating: bbc/politics/097.txt inflating: bbc/politics/098.txt inflating: bbc/politics/099.txt inflating: bbc/politics/100.txt inflating: bbc/politics/101.txt inflating: bbc/politics/102.txt inflating: bbc/politics/103.txt inflating: bbc/politics/104.txt inflating: bbc/politics/105.txt inflating: bbc/politics/106.txt inflating: bbc/politics/107.txt inflating: bbc/politics/108.txt inflating: bbc/politics/109.txt inflating: bbc/politics/110.txt inflating: bbc/politics/111.txt inflating: bbc/politics/112.txt inflating: bbc/politics/113.txt inflating: bbc/politics/114.txt inflating: bbc/politics/115.txt inflating: bbc/politics/116.txt inflating: bbc/politics/117.txt inflating: bbc/politics/118.txt inflating: bbc/politics/119.txt inflating: bbc/politics/120.txt inflating: bbc/politics/121.txt inflating: bbc/politics/122.txt inflating: bbc/politics/123.txt inflating: bbc/politics/124.txt inflating: bbc/politics/125.txt inflating: bbc/politics/126.txt inflating: bbc/politics/127.txt inflating: bbc/politics/128.txt inflating: bbc/politics/129.txt inflating: bbc/politics/130.txt inflating: bbc/politics/131.txt inflating: bbc/politics/132.txt inflating: bbc/politics/133.txt inflating: bbc/politics/134.txt inflating: bbc/politics/135.txt inflating: bbc/politics/136.txt inflating: bbc/politics/137.txt inflating: bbc/politics/138.txt inflating: bbc/politics/139.txt inflating: bbc/politics/140.txt inflating: bbc/politics/141.txt inflating: bbc/politics/142.txt inflating: bbc/politics/143.txt inflating: bbc/politics/144.txt inflating: bbc/politics/145.txt inflating: bbc/politics/146.txt inflating: bbc/politics/147.txt inflating: bbc/politics/148.txt inflating: bbc/politics/149.txt inflating: bbc/politics/150.txt inflating: bbc/politics/151.txt inflating: bbc/politics/152.txt inflating: bbc/politics/153.txt inflating: bbc/politics/154.txt inflating: bbc/politics/155.txt inflating: bbc/politics/156.txt inflating: bbc/politics/157.txt inflating: bbc/politics/158.txt inflating: bbc/politics/159.txt inflating: bbc/politics/160.txt inflating: bbc/politics/161.txt inflating: bbc/politics/162.txt inflating: bbc/politics/163.txt inflating: bbc/politics/164.txt inflating: bbc/politics/165.txt inflating: bbc/politics/166.txt inflating: bbc/politics/167.txt inflating: bbc/politics/168.txt inflating: bbc/politics/169.txt inflating: bbc/politics/170.txt inflating: bbc/politics/171.txt inflating: bbc/politics/172.txt inflating: bbc/politics/173.txt inflating: bbc/politics/174.txt inflating: bbc/politics/175.txt inflating: bbc/politics/176.txt inflating: bbc/politics/177.txt inflating: bbc/politics/178.txt inflating: bbc/politics/179.txt inflating: bbc/politics/180.txt inflating: bbc/politics/181.txt inflating: bbc/politics/182.txt inflating: bbc/politics/183.txt inflating: bbc/politics/184.txt inflating: bbc/politics/185.txt inflating: bbc/politics/186.txt inflating: bbc/politics/187.txt inflating: bbc/politics/188.txt inflating: bbc/politics/189.txt inflating: bbc/politics/190.txt inflating: bbc/politics/191.txt inflating: bbc/politics/192.txt inflating: bbc/politics/193.txt inflating: bbc/politics/194.txt inflating: bbc/politics/195.txt inflating: bbc/politics/196.txt inflating: bbc/politics/197.txt inflating: bbc/politics/198.txt inflating: bbc/politics/199.txt inflating: bbc/politics/200.txt inflating: bbc/politics/201.txt inflating: bbc/politics/202.txt inflating: bbc/politics/203.txt inflating: bbc/politics/204.txt inflating: bbc/politics/205.txt inflating: bbc/politics/206.txt inflating: bbc/politics/207.txt inflating: bbc/politics/208.txt inflating: bbc/politics/209.txt inflating: bbc/politics/210.txt inflating: bbc/politics/211.txt inflating: bbc/politics/212.txt inflating: bbc/politics/213.txt inflating: bbc/politics/214.txt inflating: bbc/politics/215.txt inflating: bbc/politics/216.txt inflating: bbc/politics/217.txt inflating: bbc/politics/218.txt inflating: bbc/politics/219.txt inflating: bbc/politics/220.txt inflating: bbc/politics/221.txt inflating: bbc/politics/222.txt inflating: bbc/politics/223.txt inflating: bbc/politics/224.txt inflating: bbc/politics/225.txt inflating: bbc/politics/226.txt inflating: bbc/politics/227.txt inflating: bbc/politics/228.txt inflating: bbc/politics/229.txt inflating: bbc/politics/230.txt inflating: bbc/politics/231.txt inflating: bbc/politics/232.txt inflating: bbc/politics/233.txt inflating: bbc/politics/234.txt inflating: bbc/politics/235.txt inflating: bbc/politics/236.txt inflating: bbc/politics/237.txt inflating: bbc/politics/238.txt inflating: bbc/politics/239.txt inflating: bbc/politics/240.txt inflating: bbc/politics/241.txt inflating: bbc/politics/242.txt inflating: bbc/politics/243.txt inflating: bbc/politics/244.txt inflating: bbc/politics/245.txt inflating: bbc/politics/246.txt inflating: bbc/politics/247.txt inflating: bbc/politics/248.txt inflating: bbc/politics/249.txt inflating: bbc/politics/250.txt inflating: bbc/politics/251.txt inflating: bbc/politics/252.txt inflating: bbc/politics/253.txt inflating: bbc/politics/254.txt inflating: bbc/politics/255.txt inflating: bbc/politics/256.txt inflating: bbc/politics/257.txt inflating: bbc/politics/258.txt inflating: bbc/politics/259.txt inflating: bbc/politics/260.txt inflating: bbc/politics/261.txt inflating: bbc/politics/262.txt inflating: bbc/politics/263.txt inflating: bbc/politics/264.txt inflating: bbc/politics/265.txt inflating: bbc/politics/266.txt inflating: bbc/politics/267.txt inflating: bbc/politics/268.txt inflating: bbc/politics/269.txt inflating: bbc/politics/270.txt inflating: bbc/politics/271.txt inflating: bbc/politics/272.txt inflating: bbc/politics/273.txt inflating: bbc/politics/274.txt inflating: bbc/politics/275.txt inflating: bbc/politics/276.txt inflating: bbc/politics/277.txt inflating: bbc/politics/278.txt inflating: bbc/politics/279.txt inflating: bbc/politics/280.txt inflating: bbc/politics/281.txt inflating: bbc/politics/282.txt inflating: bbc/politics/283.txt inflating: bbc/politics/284.txt inflating: bbc/politics/285.txt inflating: bbc/politics/286.txt inflating: bbc/politics/287.txt inflating: bbc/politics/288.txt inflating: bbc/politics/289.txt inflating: bbc/politics/290.txt inflating: bbc/politics/291.txt inflating: bbc/politics/292.txt inflating: bbc/politics/293.txt inflating: bbc/politics/294.txt inflating: bbc/politics/295.txt inflating: bbc/politics/296.txt inflating: bbc/politics/297.txt inflating: bbc/politics/298.txt inflating: bbc/politics/299.txt inflating: bbc/politics/300.txt inflating: bbc/politics/301.txt inflating: bbc/politics/302.txt inflating: bbc/politics/303.txt inflating: bbc/politics/304.txt inflating: bbc/politics/305.txt inflating: bbc/politics/306.txt inflating: bbc/politics/307.txt inflating: bbc/politics/308.txt inflating: bbc/politics/309.txt inflating: bbc/politics/310.txt inflating: bbc/politics/311.txt inflating: bbc/politics/312.txt inflating: bbc/politics/313.txt inflating: bbc/politics/314.txt inflating: bbc/politics/315.txt inflating: bbc/politics/316.txt inflating: bbc/politics/317.txt inflating: bbc/politics/318.txt inflating: bbc/politics/319.txt inflating: bbc/politics/320.txt inflating: bbc/politics/321.txt inflating: bbc/politics/322.txt inflating: bbc/politics/323.txt inflating: bbc/politics/324.txt inflating: bbc/politics/325.txt inflating: bbc/politics/326.txt inflating: bbc/politics/327.txt inflating: bbc/politics/328.txt inflating: bbc/politics/329.txt inflating: bbc/politics/330.txt inflating: bbc/politics/331.txt inflating: bbc/politics/332.txt inflating: bbc/politics/333.txt inflating: bbc/politics/334.txt inflating: bbc/politics/335.txt inflating: bbc/politics/336.txt inflating: bbc/politics/337.txt inflating: bbc/politics/338.txt inflating: bbc/politics/339.txt inflating: bbc/politics/340.txt inflating: bbc/politics/341.txt inflating: bbc/politics/342.txt inflating: bbc/politics/343.txt inflating: bbc/politics/344.txt inflating: bbc/politics/345.txt inflating: bbc/politics/346.txt inflating: bbc/politics/347.txt inflating: bbc/politics/348.txt inflating: bbc/politics/349.txt inflating: bbc/politics/350.txt inflating: bbc/politics/351.txt inflating: bbc/politics/352.txt inflating: bbc/politics/353.txt inflating: bbc/politics/354.txt inflating: bbc/politics/355.txt inflating: bbc/politics/356.txt inflating: bbc/politics/357.txt inflating: bbc/politics/358.txt inflating: bbc/politics/359.txt inflating: bbc/politics/360.txt inflating: bbc/politics/361.txt inflating: bbc/politics/362.txt inflating: bbc/politics/363.txt inflating: bbc/politics/364.txt inflating: bbc/politics/365.txt inflating: bbc/politics/366.txt inflating: bbc/politics/367.txt inflating: bbc/politics/368.txt inflating: bbc/politics/369.txt inflating: bbc/politics/370.txt inflating: bbc/politics/371.txt inflating: bbc/politics/372.txt inflating: bbc/politics/373.txt inflating: bbc/politics/374.txt inflating: bbc/politics/375.txt inflating: bbc/politics/376.txt inflating: bbc/politics/377.txt inflating: bbc/politics/378.txt inflating: bbc/politics/379.txt inflating: bbc/politics/380.txt inflating: bbc/politics/381.txt inflating: bbc/politics/382.txt inflating: bbc/politics/383.txt inflating: bbc/politics/384.txt inflating: bbc/politics/385.txt inflating: bbc/politics/386.txt inflating: bbc/politics/387.txt inflating: bbc/politics/388.txt inflating: bbc/politics/389.txt inflating: bbc/politics/390.txt inflating: bbc/politics/391.txt inflating: bbc/politics/392.txt inflating: bbc/politics/393.txt inflating: bbc/politics/394.txt inflating: bbc/politics/395.txt inflating: bbc/politics/396.txt inflating: bbc/politics/397.txt inflating: bbc/politics/398.txt inflating: bbc/politics/399.txt inflating: bbc/politics/400.txt inflating: bbc/politics/401.txt inflating: bbc/politics/402.txt inflating: bbc/politics/403.txt inflating: bbc/politics/404.txt inflating: bbc/politics/405.txt inflating: bbc/politics/406.txt inflating: bbc/politics/407.txt inflating: bbc/politics/408.txt inflating: bbc/politics/409.txt inflating: bbc/politics/410.txt inflating: bbc/politics/411.txt inflating: bbc/politics/412.txt inflating: bbc/politics/413.txt inflating: bbc/politics/414.txt inflating: bbc/politics/415.txt inflating: bbc/politics/416.txt inflating: bbc/politics/417.txt inflating: bbc/README.TXT creating: bbc/sport/ inflating: bbc/sport/001.txt inflating: bbc/sport/002.txt inflating: bbc/sport/003.txt inflating: bbc/sport/004.txt inflating: bbc/sport/005.txt inflating: bbc/sport/006.txt inflating: bbc/sport/007.txt inflating: bbc/sport/008.txt inflating: bbc/sport/009.txt inflating: bbc/sport/010.txt inflating: bbc/sport/011.txt inflating: bbc/sport/012.txt inflating: bbc/sport/013.txt inflating: bbc/sport/014.txt inflating: bbc/sport/015.txt inflating: bbc/sport/016.txt inflating: bbc/sport/017.txt inflating: bbc/sport/018.txt inflating: bbc/sport/019.txt inflating: bbc/sport/020.txt inflating: bbc/sport/021.txt inflating: bbc/sport/022.txt inflating: bbc/sport/023.txt inflating: bbc/sport/024.txt inflating: bbc/sport/025.txt inflating: bbc/sport/026.txt inflating: bbc/sport/027.txt inflating: bbc/sport/028.txt inflating: bbc/sport/029.txt inflating: bbc/sport/030.txt inflating: bbc/sport/031.txt inflating: bbc/sport/032.txt inflating: bbc/sport/033.txt inflating: bbc/sport/034.txt inflating: bbc/sport/035.txt inflating: bbc/sport/036.txt inflating: bbc/sport/037.txt inflating: bbc/sport/038.txt inflating: bbc/sport/039.txt inflating: bbc/sport/040.txt inflating: bbc/sport/041.txt inflating: bbc/sport/042.txt inflating: bbc/sport/043.txt inflating: bbc/sport/044.txt inflating: bbc/sport/045.txt inflating: bbc/sport/046.txt inflating: bbc/sport/047.txt inflating: bbc/sport/048.txt inflating: bbc/sport/049.txt inflating: bbc/sport/050.txt inflating: bbc/sport/051.txt inflating: bbc/sport/052.txt inflating: bbc/sport/053.txt inflating: bbc/sport/054.txt inflating: bbc/sport/055.txt inflating: bbc/sport/056.txt inflating: bbc/sport/057.txt inflating: bbc/sport/058.txt inflating: bbc/sport/059.txt inflating: bbc/sport/060.txt inflating: bbc/sport/061.txt inflating: bbc/sport/062.txt inflating: bbc/sport/063.txt inflating: bbc/sport/064.txt inflating: bbc/sport/065.txt inflating: bbc/sport/066.txt inflating: bbc/sport/067.txt inflating: bbc/sport/068.txt inflating: bbc/sport/069.txt inflating: bbc/sport/070.txt inflating: bbc/sport/071.txt inflating: bbc/sport/072.txt inflating: bbc/sport/073.txt inflating: bbc/sport/074.txt inflating: bbc/sport/075.txt inflating: bbc/sport/076.txt inflating: bbc/sport/077.txt inflating: bbc/sport/078.txt inflating: bbc/sport/079.txt inflating: bbc/sport/080.txt inflating: bbc/sport/081.txt inflating: bbc/sport/082.txt inflating: bbc/sport/083.txt inflating: bbc/sport/084.txt inflating: bbc/sport/085.txt inflating: bbc/sport/086.txt inflating: bbc/sport/087.txt inflating: bbc/sport/088.txt inflating: bbc/sport/089.txt inflating: bbc/sport/090.txt inflating: bbc/sport/091.txt inflating: bbc/sport/092.txt inflating: bbc/sport/093.txt inflating: bbc/sport/094.txt inflating: bbc/sport/095.txt inflating: bbc/sport/096.txt inflating: bbc/sport/097.txt inflating: bbc/sport/098.txt inflating: bbc/sport/099.txt inflating: bbc/sport/100.txt inflating: bbc/sport/101.txt inflating: bbc/sport/102.txt inflating: bbc/sport/103.txt inflating: bbc/sport/104.txt inflating: bbc/sport/105.txt inflating: bbc/sport/106.txt inflating: bbc/sport/107.txt inflating: bbc/sport/108.txt inflating: bbc/sport/109.txt inflating: bbc/sport/110.txt inflating: bbc/sport/111.txt inflating: bbc/sport/112.txt inflating: bbc/sport/113.txt inflating: bbc/sport/114.txt inflating: bbc/sport/115.txt inflating: bbc/sport/116.txt inflating: bbc/sport/117.txt inflating: bbc/sport/118.txt inflating: bbc/sport/119.txt inflating: bbc/sport/120.txt inflating: bbc/sport/121.txt inflating: bbc/sport/122.txt inflating: bbc/sport/123.txt inflating: bbc/sport/124.txt inflating: bbc/sport/125.txt inflating: bbc/sport/126.txt inflating: bbc/sport/127.txt inflating: bbc/sport/128.txt inflating: bbc/sport/129.txt inflating: bbc/sport/130.txt inflating: bbc/sport/131.txt inflating: bbc/sport/132.txt inflating: bbc/sport/133.txt inflating: bbc/sport/134.txt inflating: bbc/sport/135.txt inflating: bbc/sport/136.txt inflating: bbc/sport/137.txt inflating: bbc/sport/138.txt inflating: bbc/sport/139.txt inflating: bbc/sport/140.txt inflating: bbc/sport/141.txt inflating: bbc/sport/142.txt inflating: bbc/sport/143.txt inflating: bbc/sport/144.txt inflating: bbc/sport/145.txt inflating: bbc/sport/146.txt inflating: bbc/sport/147.txt inflating: bbc/sport/148.txt inflating: bbc/sport/149.txt inflating: bbc/sport/150.txt inflating: bbc/sport/151.txt inflating: bbc/sport/152.txt inflating: bbc/sport/153.txt inflating: bbc/sport/154.txt inflating: bbc/sport/155.txt inflating: bbc/sport/156.txt inflating: bbc/sport/157.txt inflating: bbc/sport/158.txt inflating: bbc/sport/159.txt inflating: bbc/sport/160.txt inflating: bbc/sport/161.txt inflating: bbc/sport/162.txt inflating: bbc/sport/163.txt inflating: bbc/sport/164.txt inflating: bbc/sport/165.txt inflating: bbc/sport/166.txt inflating: bbc/sport/167.txt inflating: bbc/sport/168.txt inflating: bbc/sport/169.txt inflating: bbc/sport/170.txt inflating: bbc/sport/171.txt inflating: bbc/sport/172.txt inflating: bbc/sport/173.txt inflating: bbc/sport/174.txt inflating: bbc/sport/175.txt inflating: bbc/sport/176.txt inflating: bbc/sport/177.txt inflating: bbc/sport/178.txt inflating: bbc/sport/179.txt inflating: bbc/sport/180.txt inflating: bbc/sport/181.txt inflating: bbc/sport/182.txt inflating: bbc/sport/183.txt inflating: bbc/sport/184.txt inflating: bbc/sport/185.txt inflating: bbc/sport/186.txt inflating: bbc/sport/187.txt inflating: bbc/sport/188.txt inflating: bbc/sport/189.txt inflating: bbc/sport/190.txt inflating: bbc/sport/191.txt inflating: bbc/sport/192.txt inflating: bbc/sport/193.txt inflating: bbc/sport/194.txt inflating: bbc/sport/195.txt inflating: bbc/sport/196.txt inflating: bbc/sport/197.txt inflating: bbc/sport/198.txt inflating: bbc/sport/199.txt inflating: bbc/sport/200.txt inflating: bbc/sport/201.txt inflating: bbc/sport/202.txt inflating: bbc/sport/203.txt inflating: bbc/sport/204.txt inflating: bbc/sport/205.txt inflating: bbc/sport/206.txt inflating: bbc/sport/207.txt inflating: bbc/sport/208.txt inflating: bbc/sport/209.txt inflating: bbc/sport/210.txt inflating: bbc/sport/211.txt inflating: bbc/sport/212.txt inflating: bbc/sport/213.txt inflating: bbc/sport/214.txt inflating: bbc/sport/215.txt inflating: bbc/sport/216.txt inflating: bbc/sport/217.txt inflating: bbc/sport/218.txt inflating: bbc/sport/219.txt inflating: bbc/sport/220.txt inflating: bbc/sport/221.txt inflating: bbc/sport/222.txt inflating: bbc/sport/223.txt inflating: bbc/sport/224.txt inflating: bbc/sport/225.txt inflating: bbc/sport/226.txt inflating: bbc/sport/227.txt inflating: bbc/sport/228.txt inflating: bbc/sport/229.txt inflating: bbc/sport/230.txt inflating: bbc/sport/231.txt inflating: bbc/sport/232.txt inflating: bbc/sport/233.txt inflating: bbc/sport/234.txt inflating: bbc/sport/235.txt inflating: bbc/sport/236.txt inflating: bbc/sport/237.txt inflating: bbc/sport/238.txt inflating: bbc/sport/239.txt inflating: bbc/sport/240.txt inflating: bbc/sport/241.txt inflating: bbc/sport/242.txt inflating: bbc/sport/243.txt inflating: bbc/sport/244.txt inflating: bbc/sport/245.txt inflating: bbc/sport/246.txt inflating: bbc/sport/247.txt inflating: bbc/sport/248.txt inflating: bbc/sport/249.txt inflating: bbc/sport/250.txt inflating: bbc/sport/251.txt inflating: bbc/sport/252.txt inflating: bbc/sport/253.txt inflating: bbc/sport/254.txt inflating: bbc/sport/255.txt inflating: bbc/sport/256.txt inflating: bbc/sport/257.txt inflating: bbc/sport/258.txt inflating: bbc/sport/259.txt inflating: bbc/sport/260.txt inflating: bbc/sport/261.txt inflating: bbc/sport/262.txt inflating: bbc/sport/263.txt inflating: bbc/sport/264.txt inflating: bbc/sport/265.txt inflating: bbc/sport/266.txt inflating: bbc/sport/267.txt inflating: bbc/sport/268.txt inflating: bbc/sport/269.txt inflating: bbc/sport/270.txt inflating: bbc/sport/271.txt inflating: bbc/sport/272.txt inflating: bbc/sport/273.txt inflating: bbc/sport/274.txt inflating: bbc/sport/275.txt inflating: bbc/sport/276.txt inflating: bbc/sport/277.txt inflating: bbc/sport/278.txt inflating: bbc/sport/279.txt inflating: bbc/sport/280.txt inflating: bbc/sport/281.txt inflating: bbc/sport/282.txt inflating: bbc/sport/283.txt inflating: bbc/sport/284.txt inflating: bbc/sport/285.txt inflating: bbc/sport/286.txt inflating: bbc/sport/287.txt inflating: bbc/sport/288.txt inflating: bbc/sport/289.txt inflating: bbc/sport/290.txt inflating: bbc/sport/291.txt inflating: bbc/sport/292.txt inflating: bbc/sport/293.txt inflating: bbc/sport/294.txt inflating: bbc/sport/295.txt inflating: bbc/sport/296.txt inflating: bbc/sport/297.txt inflating: bbc/sport/298.txt inflating: bbc/sport/299.txt inflating: bbc/sport/300.txt inflating: bbc/sport/301.txt inflating: bbc/sport/302.txt inflating: bbc/sport/303.txt inflating: bbc/sport/304.txt inflating: bbc/sport/305.txt inflating: bbc/sport/306.txt inflating: bbc/sport/307.txt inflating: bbc/sport/308.txt inflating: bbc/sport/309.txt inflating: bbc/sport/310.txt inflating: bbc/sport/311.txt inflating: bbc/sport/312.txt inflating: bbc/sport/313.txt inflating: bbc/sport/314.txt inflating: bbc/sport/315.txt inflating: bbc/sport/316.txt inflating: bbc/sport/317.txt inflating: bbc/sport/318.txt inflating: bbc/sport/319.txt inflating: bbc/sport/320.txt inflating: bbc/sport/321.txt inflating: bbc/sport/322.txt inflating: bbc/sport/323.txt inflating: bbc/sport/324.txt inflating: bbc/sport/325.txt inflating: bbc/sport/326.txt inflating: bbc/sport/327.txt inflating: bbc/sport/328.txt inflating: bbc/sport/329.txt inflating: bbc/sport/330.txt inflating: bbc/sport/331.txt inflating: bbc/sport/332.txt inflating: bbc/sport/333.txt inflating: bbc/sport/334.txt inflating: bbc/sport/335.txt inflating: bbc/sport/336.txt inflating: bbc/sport/337.txt inflating: bbc/sport/338.txt inflating: bbc/sport/339.txt inflating: bbc/sport/340.txt inflating: bbc/sport/341.txt inflating: bbc/sport/342.txt inflating: bbc/sport/343.txt inflating: bbc/sport/344.txt inflating: bbc/sport/345.txt inflating: bbc/sport/346.txt inflating: bbc/sport/347.txt inflating: bbc/sport/348.txt inflating: bbc/sport/349.txt inflating: bbc/sport/350.txt inflating: bbc/sport/351.txt inflating: bbc/sport/352.txt inflating: bbc/sport/353.txt inflating: bbc/sport/354.txt inflating: bbc/sport/355.txt inflating: bbc/sport/356.txt inflating: bbc/sport/357.txt inflating: bbc/sport/358.txt inflating: bbc/sport/359.txt inflating: bbc/sport/360.txt inflating: bbc/sport/361.txt inflating: bbc/sport/362.txt inflating: bbc/sport/363.txt inflating: bbc/sport/364.txt inflating: bbc/sport/365.txt inflating: bbc/sport/366.txt inflating: bbc/sport/367.txt inflating: bbc/sport/368.txt inflating: bbc/sport/369.txt inflating: bbc/sport/370.txt inflating: bbc/sport/371.txt inflating: bbc/sport/372.txt inflating: bbc/sport/373.txt inflating: bbc/sport/374.txt inflating: bbc/sport/375.txt inflating: bbc/sport/376.txt inflating: bbc/sport/377.txt inflating: bbc/sport/378.txt inflating: bbc/sport/379.txt inflating: bbc/sport/380.txt inflating: bbc/sport/381.txt inflating: bbc/sport/382.txt inflating: bbc/sport/383.txt inflating: bbc/sport/384.txt inflating: bbc/sport/385.txt inflating: bbc/sport/386.txt inflating: bbc/sport/387.txt inflating: bbc/sport/388.txt inflating: bbc/sport/389.txt inflating: bbc/sport/390.txt inflating: bbc/sport/391.txt inflating: bbc/sport/392.txt inflating: bbc/sport/393.txt inflating: bbc/sport/394.txt inflating: bbc/sport/395.txt inflating: bbc/sport/396.txt inflating: bbc/sport/397.txt inflating: bbc/sport/398.txt inflating: bbc/sport/399.txt inflating: bbc/sport/400.txt inflating: bbc/sport/401.txt inflating: bbc/sport/402.txt inflating: bbc/sport/403.txt inflating: bbc/sport/404.txt inflating: bbc/sport/405.txt inflating: bbc/sport/406.txt inflating: bbc/sport/407.txt inflating: bbc/sport/408.txt inflating: bbc/sport/409.txt inflating: bbc/sport/410.txt inflating: bbc/sport/411.txt inflating: bbc/sport/412.txt inflating: bbc/sport/413.txt inflating: bbc/sport/414.txt inflating: bbc/sport/415.txt inflating: bbc/sport/416.txt inflating: bbc/sport/417.txt inflating: bbc/sport/418.txt inflating: bbc/sport/419.txt inflating: bbc/sport/420.txt inflating: bbc/sport/421.txt inflating: bbc/sport/422.txt inflating: bbc/sport/423.txt inflating: bbc/sport/424.txt inflating: bbc/sport/425.txt inflating: bbc/sport/426.txt inflating: bbc/sport/427.txt inflating: bbc/sport/428.txt inflating: bbc/sport/429.txt inflating: bbc/sport/430.txt inflating: bbc/sport/431.txt inflating: bbc/sport/432.txt inflating: bbc/sport/433.txt inflating: bbc/sport/434.txt inflating: bbc/sport/435.txt inflating: bbc/sport/436.txt inflating: bbc/sport/437.txt inflating: bbc/sport/438.txt inflating: bbc/sport/439.txt inflating: bbc/sport/440.txt inflating: bbc/sport/441.txt inflating: bbc/sport/442.txt inflating: bbc/sport/443.txt inflating: bbc/sport/444.txt inflating: bbc/sport/445.txt inflating: bbc/sport/446.txt inflating: bbc/sport/447.txt inflating: bbc/sport/448.txt inflating: bbc/sport/449.txt inflating: bbc/sport/450.txt inflating: bbc/sport/451.txt inflating: bbc/sport/452.txt inflating: bbc/sport/453.txt inflating: bbc/sport/454.txt inflating: bbc/sport/455.txt inflating: bbc/sport/456.txt inflating: bbc/sport/457.txt inflating: bbc/sport/458.txt inflating: bbc/sport/459.txt inflating: bbc/sport/460.txt inflating: bbc/sport/461.txt inflating: bbc/sport/462.txt inflating: bbc/sport/463.txt inflating: bbc/sport/464.txt inflating: bbc/sport/465.txt inflating: bbc/sport/466.txt inflating: bbc/sport/467.txt inflating: bbc/sport/468.txt inflating: bbc/sport/469.txt inflating: bbc/sport/470.txt inflating: bbc/sport/471.txt inflating: bbc/sport/472.txt inflating: bbc/sport/473.txt inflating: bbc/sport/474.txt inflating: bbc/sport/475.txt inflating: bbc/sport/476.txt inflating: bbc/sport/477.txt inflating: bbc/sport/478.txt inflating: bbc/sport/479.txt inflating: bbc/sport/480.txt inflating: bbc/sport/481.txt inflating: bbc/sport/482.txt inflating: bbc/sport/483.txt inflating: bbc/sport/484.txt inflating: bbc/sport/485.txt inflating: bbc/sport/486.txt inflating: bbc/sport/487.txt inflating: bbc/sport/488.txt inflating: bbc/sport/489.txt inflating: bbc/sport/490.txt inflating: bbc/sport/491.txt inflating: bbc/sport/492.txt inflating: bbc/sport/493.txt inflating: bbc/sport/494.txt inflating: bbc/sport/495.txt inflating: bbc/sport/496.txt inflating: bbc/sport/497.txt inflating: bbc/sport/498.txt inflating: bbc/sport/499.txt inflating: bbc/sport/500.txt inflating: bbc/sport/501.txt inflating: bbc/sport/502.txt inflating: bbc/sport/503.txt inflating: bbc/sport/504.txt inflating: bbc/sport/505.txt inflating: bbc/sport/506.txt inflating: bbc/sport/507.txt inflating: bbc/sport/508.txt inflating: bbc/sport/509.txt inflating: bbc/sport/510.txt inflating: bbc/sport/511.txt creating: bbc/tech/ inflating: bbc/tech/001.txt inflating: bbc/tech/002.txt inflating: bbc/tech/003.txt inflating: bbc/tech/004.txt inflating: bbc/tech/005.txt inflating: bbc/tech/006.txt inflating: bbc/tech/007.txt inflating: bbc/tech/008.txt inflating: bbc/tech/009.txt inflating: bbc/tech/010.txt inflating: bbc/tech/011.txt inflating: bbc/tech/012.txt inflating: bbc/tech/013.txt inflating: bbc/tech/014.txt inflating: bbc/tech/015.txt inflating: bbc/tech/016.txt inflating: bbc/tech/017.txt inflating: bbc/tech/018.txt inflating: bbc/tech/019.txt inflating: bbc/tech/020.txt inflating: bbc/tech/021.txt inflating: bbc/tech/022.txt inflating: bbc/tech/023.txt inflating: bbc/tech/024.txt inflating: bbc/tech/025.txt inflating: bbc/tech/026.txt inflating: bbc/tech/027.txt inflating: bbc/tech/028.txt inflating: bbc/tech/029.txt inflating: bbc/tech/030.txt inflating: bbc/tech/031.txt inflating: bbc/tech/032.txt inflating: bbc/tech/033.txt inflating: bbc/tech/034.txt inflating: bbc/tech/035.txt inflating: bbc/tech/036.txt inflating: bbc/tech/037.txt inflating: bbc/tech/038.txt inflating: bbc/tech/039.txt inflating: bbc/tech/040.txt inflating: bbc/tech/041.txt inflating: bbc/tech/042.txt inflating: bbc/tech/043.txt inflating: bbc/tech/044.txt inflating: bbc/tech/045.txt inflating: bbc/tech/046.txt inflating: bbc/tech/047.txt inflating: bbc/tech/048.txt inflating: bbc/tech/049.txt inflating: bbc/tech/050.txt inflating: bbc/tech/051.txt inflating: bbc/tech/052.txt inflating: bbc/tech/053.txt inflating: bbc/tech/054.txt inflating: bbc/tech/055.txt inflating: bbc/tech/056.txt inflating: bbc/tech/057.txt inflating: bbc/tech/058.txt inflating: bbc/tech/059.txt inflating: bbc/tech/060.txt inflating: bbc/tech/061.txt inflating: bbc/tech/062.txt inflating: bbc/tech/063.txt inflating: bbc/tech/064.txt inflating: bbc/tech/065.txt inflating: bbc/tech/066.txt inflating: bbc/tech/067.txt inflating: bbc/tech/068.txt inflating: bbc/tech/069.txt inflating: bbc/tech/070.txt inflating: bbc/tech/071.txt inflating: bbc/tech/072.txt inflating: bbc/tech/073.txt inflating: bbc/tech/074.txt inflating: bbc/tech/075.txt inflating: bbc/tech/076.txt inflating: bbc/tech/077.txt inflating: bbc/tech/078.txt inflating: bbc/tech/079.txt inflating: bbc/tech/080.txt inflating: bbc/tech/081.txt inflating: bbc/tech/082.txt inflating: bbc/tech/083.txt inflating: bbc/tech/084.txt inflating: bbc/tech/085.txt inflating: bbc/tech/086.txt inflating: bbc/tech/087.txt inflating: bbc/tech/088.txt inflating: bbc/tech/089.txt inflating: bbc/tech/090.txt inflating: bbc/tech/091.txt inflating: bbc/tech/092.txt inflating: bbc/tech/093.txt inflating: bbc/tech/094.txt inflating: bbc/tech/095.txt inflating: bbc/tech/096.txt inflating: bbc/tech/097.txt inflating: bbc/tech/098.txt inflating: bbc/tech/099.txt inflating: bbc/tech/100.txt inflating: bbc/tech/101.txt inflating: bbc/tech/102.txt inflating: bbc/tech/103.txt inflating: bbc/tech/104.txt inflating: bbc/tech/105.txt inflating: bbc/tech/106.txt inflating: bbc/tech/107.txt inflating: bbc/tech/108.txt inflating: bbc/tech/109.txt inflating: bbc/tech/110.txt inflating: bbc/tech/111.txt inflating: bbc/tech/112.txt inflating: bbc/tech/113.txt inflating: bbc/tech/114.txt inflating: bbc/tech/115.txt inflating: bbc/tech/116.txt inflating: bbc/tech/117.txt inflating: bbc/tech/118.txt inflating: bbc/tech/119.txt inflating: bbc/tech/120.txt inflating: bbc/tech/121.txt inflating: bbc/tech/122.txt inflating: bbc/tech/123.txt inflating: bbc/tech/124.txt inflating: bbc/tech/125.txt inflating: bbc/tech/126.txt inflating: bbc/tech/127.txt inflating: bbc/tech/128.txt inflating: bbc/tech/129.txt inflating: bbc/tech/130.txt inflating: bbc/tech/131.txt inflating: bbc/tech/132.txt inflating: bbc/tech/133.txt inflating: bbc/tech/134.txt inflating: bbc/tech/135.txt inflating: bbc/tech/136.txt inflating: bbc/tech/137.txt inflating: bbc/tech/138.txt inflating: bbc/tech/139.txt inflating: bbc/tech/140.txt inflating: bbc/tech/141.txt inflating: bbc/tech/142.txt inflating: bbc/tech/143.txt inflating: bbc/tech/144.txt inflating: bbc/tech/145.txt inflating: bbc/tech/146.txt inflating: bbc/tech/147.txt inflating: bbc/tech/148.txt inflating: bbc/tech/149.txt inflating: bbc/tech/150.txt inflating: bbc/tech/151.txt inflating: bbc/tech/152.txt inflating: bbc/tech/153.txt inflating: bbc/tech/154.txt inflating: bbc/tech/155.txt inflating: bbc/tech/156.txt inflating: bbc/tech/157.txt inflating: bbc/tech/158.txt inflating: bbc/tech/159.txt inflating: bbc/tech/160.txt inflating: bbc/tech/161.txt inflating: bbc/tech/162.txt inflating: bbc/tech/163.txt inflating: bbc/tech/164.txt inflating: bbc/tech/165.txt inflating: bbc/tech/166.txt inflating: bbc/tech/167.txt inflating: bbc/tech/168.txt inflating: bbc/tech/169.txt inflating: bbc/tech/170.txt inflating: bbc/tech/171.txt inflating: bbc/tech/172.txt inflating: bbc/tech/173.txt inflating: bbc/tech/174.txt inflating: bbc/tech/175.txt inflating: bbc/tech/176.txt inflating: bbc/tech/177.txt inflating: bbc/tech/178.txt inflating: bbc/tech/179.txt inflating: bbc/tech/180.txt inflating: bbc/tech/181.txt inflating: bbc/tech/182.txt inflating: bbc/tech/183.txt inflating: bbc/tech/184.txt inflating: bbc/tech/185.txt inflating: bbc/tech/186.txt inflating: bbc/tech/187.txt inflating: bbc/tech/188.txt inflating: bbc/tech/189.txt inflating: bbc/tech/190.txt inflating: bbc/tech/191.txt inflating: bbc/tech/192.txt inflating: bbc/tech/193.txt inflating: bbc/tech/194.txt inflating: bbc/tech/195.txt inflating: bbc/tech/196.txt inflating: bbc/tech/197.txt inflating: bbc/tech/198.txt inflating: bbc/tech/199.txt inflating: bbc/tech/200.txt inflating: bbc/tech/201.txt inflating: bbc/tech/202.txt inflating: bbc/tech/203.txt inflating: bbc/tech/204.txt inflating: bbc/tech/205.txt inflating: bbc/tech/206.txt inflating: bbc/tech/207.txt inflating: bbc/tech/208.txt inflating: bbc/tech/209.txt inflating: bbc/tech/210.txt inflating: bbc/tech/211.txt inflating: bbc/tech/212.txt inflating: bbc/tech/213.txt inflating: bbc/tech/214.txt inflating: bbc/tech/215.txt inflating: bbc/tech/216.txt inflating: bbc/tech/217.txt inflating: bbc/tech/218.txt inflating: bbc/tech/219.txt inflating: bbc/tech/220.txt inflating: bbc/tech/221.txt inflating: bbc/tech/222.txt inflating: bbc/tech/223.txt inflating: bbc/tech/224.txt inflating: bbc/tech/225.txt inflating: bbc/tech/226.txt inflating: bbc/tech/227.txt inflating: bbc/tech/228.txt inflating: bbc/tech/229.txt inflating: bbc/tech/230.txt inflating: bbc/tech/231.txt inflating: bbc/tech/232.txt inflating: bbc/tech/233.txt inflating: bbc/tech/234.txt inflating: bbc/tech/235.txt inflating: bbc/tech/236.txt inflating: bbc/tech/237.txt inflating: bbc/tech/238.txt inflating: bbc/tech/239.txt inflating: bbc/tech/240.txt inflating: bbc/tech/241.txt inflating: bbc/tech/242.txt inflating: bbc/tech/243.txt inflating: bbc/tech/244.txt inflating: bbc/tech/245.txt inflating: bbc/tech/246.txt inflating: bbc/tech/247.txt inflating: bbc/tech/248.txt inflating: bbc/tech/249.txt inflating: bbc/tech/250.txt inflating: bbc/tech/251.txt inflating: bbc/tech/252.txt inflating: bbc/tech/253.txt inflating: bbc/tech/254.txt inflating: bbc/tech/255.txt inflating: bbc/tech/256.txt inflating: bbc/tech/257.txt inflating: bbc/tech/258.txt inflating: bbc/tech/259.txt inflating: bbc/tech/260.txt inflating: bbc/tech/261.txt inflating: bbc/tech/262.txt inflating: bbc/tech/263.txt inflating: bbc/tech/264.txt inflating: bbc/tech/265.txt inflating: bbc/tech/266.txt inflating: bbc/tech/267.txt inflating: bbc/tech/268.txt inflating: bbc/tech/269.txt inflating: bbc/tech/270.txt inflating: bbc/tech/271.txt inflating: bbc/tech/272.txt inflating: bbc/tech/273.txt inflating: bbc/tech/274.txt inflating: bbc/tech/275.txt inflating: bbc/tech/276.txt inflating: bbc/tech/277.txt inflating: bbc/tech/278.txt inflating: bbc/tech/279.txt inflating: bbc/tech/280.txt inflating: bbc/tech/281.txt inflating: bbc/tech/282.txt inflating: bbc/tech/283.txt inflating: bbc/tech/284.txt inflating: bbc/tech/285.txt inflating: bbc/tech/286.txt inflating: bbc/tech/287.txt inflating: bbc/tech/288.txt inflating: bbc/tech/289.txt inflating: bbc/tech/290.txt inflating: bbc/tech/291.txt inflating: bbc/tech/292.txt inflating: bbc/tech/293.txt inflating: bbc/tech/294.txt inflating: bbc/tech/295.txt inflating: bbc/tech/296.txt inflating: bbc/tech/297.txt inflating: bbc/tech/298.txt inflating: bbc/tech/299.txt inflating: bbc/tech/300.txt inflating: bbc/tech/301.txt inflating: bbc/tech/302.txt inflating: bbc/tech/303.txt inflating: bbc/tech/304.txt inflating: bbc/tech/305.txt inflating: bbc/tech/306.txt inflating: bbc/tech/307.txt inflating: bbc/tech/308.txt inflating: bbc/tech/309.txt inflating: bbc/tech/310.txt inflating: bbc/tech/311.txt inflating: bbc/tech/312.txt inflating: bbc/tech/313.txt inflating: bbc/tech/314.txt inflating: bbc/tech/315.txt inflating: bbc/tech/316.txt inflating: bbc/tech/317.txt inflating: bbc/tech/318.txt inflating: bbc/tech/319.txt inflating: bbc/tech/320.txt inflating: bbc/tech/321.txt inflating: bbc/tech/322.txt inflating: bbc/tech/323.txt inflating: bbc/tech/324.txt inflating: bbc/tech/325.txt inflating: bbc/tech/326.txt inflating: bbc/tech/327.txt inflating: bbc/tech/328.txt inflating: bbc/tech/329.txt inflating: bbc/tech/330.txt inflating: bbc/tech/331.txt inflating: bbc/tech/332.txt inflating: bbc/tech/333.txt inflating: bbc/tech/334.txt inflating: bbc/tech/335.txt inflating: bbc/tech/336.txt inflating: bbc/tech/337.txt inflating: bbc/tech/338.txt inflating: bbc/tech/339.txt inflating: bbc/tech/340.txt inflating: bbc/tech/341.txt inflating: bbc/tech/342.txt inflating: bbc/tech/343.txt inflating: bbc/tech/344.txt inflating: bbc/tech/345.txt inflating: bbc/tech/346.txt inflating: bbc/tech/347.txt inflating: bbc/tech/348.txt inflating: bbc/tech/349.txt inflating: bbc/tech/350.txt inflating: bbc/tech/351.txt inflating: bbc/tech/352.txt inflating: bbc/tech/353.txt inflating: bbc/tech/354.txt inflating: bbc/tech/355.txt inflating: bbc/tech/356.txt inflating: bbc/tech/357.txt inflating: bbc/tech/358.txt inflating: bbc/tech/359.txt inflating: bbc/tech/360.txt inflating: bbc/tech/361.txt inflating: bbc/tech/362.txt inflating: bbc/tech/363.txt inflating: bbc/tech/364.txt inflating: bbc/tech/365.txt inflating: bbc/tech/366.txt inflating: bbc/tech/367.txt inflating: bbc/tech/368.txt inflating: bbc/tech/369.txt inflating: bbc/tech/370.txt inflating: bbc/tech/371.txt inflating: bbc/tech/372.txt inflating: bbc/tech/373.txt inflating: bbc/tech/374.txt inflating: bbc/tech/375.txt inflating: bbc/tech/376.txt inflating: bbc/tech/377.txt inflating: bbc/tech/378.txt inflating: bbc/tech/379.txt inflating: bbc/tech/380.txt inflating: bbc/tech/381.txt inflating: bbc/tech/382.txt inflating: bbc/tech/383.txt inflating: bbc/tech/384.txt inflating: bbc/tech/385.txt inflating: bbc/tech/386.txt inflating: bbc/tech/387.txt inflating: bbc/tech/388.txt inflating: bbc/tech/389.txt inflating: bbc/tech/390.txt inflating: bbc/tech/391.txt inflating: bbc/tech/392.txt inflating: bbc/tech/393.txt inflating: bbc/tech/394.txt inflating: bbc/tech/395.txt inflating: bbc/tech/396.txt inflating: bbc/tech/397.txt inflating: bbc/tech/398.txt inflating: bbc/tech/399.txt inflating: bbc/tech/400.txt inflating: bbc/tech/401.txt
Load the dataset and convert it to a dataframe.
data = load_files('/content/bbc', encoding="utf-8", decode_error="replace", random_state=random_state)
df = pd.DataFrame(list(zip(data['data'], data['target'])), columns=['text', 'label'])
df.head()
text | label | |
---|---|---|
0 | Chris Evans back on the market\n\nBroadcaster ... | 1 |
1 | Giggs handed Wales leading role\n\nRyan Giggs ... | 3 |
2 | Wales silent on Grand Slam talk\n\nRhys Willia... | 3 |
3 | Kenya lift Chepkemei's suspension\n\nKenya's a... | 3 |
4 | Lee to create new film superhero\n\nComic book... | 1 |
2. Print the unique target names in your data and check the number of articles in each category. Then split your data into training (80%) and test (20%) sets.
labels, counts = np.unique(df['label'], return_counts=True) # np.unique(data.target, return_counts=True)
print(dict(zip(data.target_names, counts)))
{'business': 510, 'entertainment': 386, 'politics': 417, 'sport': 511, 'tech': 401}
X_train, X_test, y_train, y_test = train_test_split(df["text"], df["label"], test_size=0.2, random_state=random_state)
3. Use the CountVectorizer
from sklearn
and convert the text data into a document-term matrix. What is the difference between CountVectorizer
and tfidfVectorizer(use_idf=False)
?
#tokenizer to remove unwanted elements from out data like symbols
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
# Initialize the "CountVectorizer" object, which is scikit-learn's bag of words tool.
# If you have memory issues, reduce the max_features value so you can continue with the practical
vectorizer = CountVectorizer(lowercase=True,
tokenizer=token.tokenize,
stop_words='english',
ngram_range=(1, 2),
analyzer='word',
min_df=3,
max_features=None)
# fit_transform() does two functions: First, it fits the model and learns the vocabulary;
# second, it transforms our data into feature vectors.
# The input to fit_transform should be a list of strings.
bbc_dtm = vectorizer.fit_transform(X_train)
print(bbc_dtm.shape)
/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py:528: UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None' warnings.warn(
(1780, 25223)
The only difference is that the TfidfVectorizer()
returns floats while the CountVectorizer()
returns ints. And that’s to be expected – as explained in the documentation quoted above, TfidfVectorizer()
assigns a score while CountVectorizer()
counts.
4. Print top 20 most frequent words in the training set.
importance = np.argsort(np.asarray(bbc_dtm.sum(axis=0)).ravel())[::-1]
feature_names = np.array(vectorizer.get_feature_names_out())
feature_names[importance[:20]]
array(['s', 'said', 'mr', 'year', 'people', 'new', 't', 'time', 'world', 'government', 'uk', 'years', 'best', 'just', 'told', 'film', 'make', '1', 'game', 'like'], dtype=object)
5. From the feature selection library in sklearn
load the SelectKBest
function and apply it on the BBC dataset using the chi-squared method. Extract top 20 features.
X_test_vectorized = vectorizer.transform(X_test)
ch2 = SelectKBest(chi2, k=20)
ch2.fit_transform(bbc_dtm, y_train)
<1780x20 sparse matrix of type '<class 'numpy.int64'>' with 4428 stored elements in Compressed Sparse Row format>
feature_names_chi = [feature_names[i] for i
in ch2.get_support(indices=True)]
feature_names_chi
['best', 'blair', 'brown', 'computer', 'digital', 'election', 'film', 'government', 'labour', 'minister', 'mobile', 'mr', 'mr blair', 'music', 'net', 'party', 'people', 'software', 'technology', 'users']
6. Repeat the analysis in Question 5 with the mutual information feature selection method. Do you get the same list of words as compared to the chi-squared method?
mutual_info = SelectKBest(mutual_info_classif, k=20)
mutual_info.fit_transform(bbc_dtm, y_train)
<1780x20 sparse matrix of type '<class 'numpy.int64'>' with 6350 stored elements in Compressed Sparse Row format>
feature_names_mutual_info = [feature_names[i] for i
in mutual_info.get_support(indices=True)]
feature_names_mutual_info
['blair', 'coach', 'election', 'film', 'firm', 'game', 'government', 'labour', 'market', 'minister', 'mr', 'music', 'party', 'people', 'said', 'secretary', 'technology', 'tory', 'users', 'win']
Now you can build a classifier and train it using the output of these feature selection techniques. We are not going to do this right now, but if you are interested you can transform your training and test set using the selected features and continue with your classifier! Here are some tips:
# X_train = mutual_info.fit_transform(bbc_dtm, y_train)
# X_test = mutual_info.transform(X_test_vectorized)
7. One of the functions for embedded feature selection is the SelectFromModel
function in sklearn
. Use this function with L1 norm SVM and check how many non-zero coefficients left in the model.
print("shape of the matrix before applying the embedded feature selection:", bbc_dtm.shape)
lsvc = LinearSVC(C=0.01, penalty="l1", dual=False)
model = SelectFromModel(lsvc).fit(bbc_dtm, y_train) # you can add threshold=0.18 as another argument to select features that have an importance of more than 0.18
X_new = model.transform(bbc_dtm)
print("shape of the matrix after applying the embedded feature selection:", X_new.shape)
shape of the matrix before applying the embedded feature selection: (1780, 25223) shape of the matrix after applying the embedded feature selection: (1780, 154)
model
SelectFromModel(estimator=LinearSVC(C=0.01, dual=False, penalty='l1'))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
SelectFromModel(estimator=LinearSVC(C=0.01, dual=False, penalty='l1'))
LinearSVC(C=0.01, dual=False, penalty='l1')
LinearSVC(C=0.01, dual=False, penalty='l1')
# you can also check the coefficient values
model.estimator_.coef_
array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]])
8. What are the top features according to the SVM model? Tip: Use the function model.get_support()
to find these features.
model.get_support()
array([False, False, False, ..., False, False, False])
print("Features selected by SelectFromModel: ", feature_names[model.get_support()])
Features selected by SelectFromModel: ['000' '1' '2' '2004' '6' 'airlines' 'album' 'analysts' 'apple' 'athens' 'athletics' 'award' 'ballet' 'ban' 'band' 'bank' 'bbc' 'best' 'bid' 'blair' 'blog' 'book' 'britain' 'broadband' 'brown' 'business' 'champion' 'chart' 'chelsea' 'chief' 'children' 'china' 'club' 'coach' 'comedy' 'companies' 'company' 'computer' 'conte' 'content' 'council' 'cup' 'data' 'deal' 'digital' 'dollar' 'doping' 'drugs' 'e' 'economic' 'economy' 'education' 'election' 'england' 'eu' 'european' 'euros' 'film' 'financial' 'firm' 'firms' 'fraud' 'game' 'games' 'gaming' 'glazer' 'good' 'government' 'group' 'growth' 'high' 'home' 'howard' 'iaaf' 'information' 'injury' 'just' 'labour' 'league' 'like' 'liverpool' 'lord' 'm' 'make' 'market' 'match' 'microsoft' 'million' 'minister' 'mobile' 'mps' 'mr' 'music' 'musical' 'net' 'new' 'nintendo' 'number' 'o' 'oil' 'old' 'olympic' 'online' 'party' 'people' 'plans' 'play' 'players' 'police' 'president' 'prices' 'public' 'rights' 'rugby' 's' 'said' 'sales' 'says' 'season' 'secretary' 'series' 'service' 'services' 'set' 'shares' 'singer' 'site' 'software' 'sony' 'spam' 'star' 'stars' 'state' 't' 'team' 'technology' 'time' 'trade' 'tv' 'uk' 'united' 'use' 'used' 'users' 'using' 'video' 'virus' 'web' 'website' 'win' 'won' 'world' 'year' 'year old']
9. Create a pipeline with the tfidf representation and a random forest classifier.
clf1 = Pipeline([
('vectorizer', CountVectorizer()),
('feature_extraction', TfidfTransformer()),
('classification', RandomForestClassifier())
])
10. Fit the pipeline on the training set.
clf1.fit(X_train, y_train)
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('classification', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('classification', RandomForestClassifier())])
CountVectorizer()
TfidfTransformer()
RandomForestClassifier()
11. Use the pipeline to predict the outcome variable on your test set. Evaluate the performance of the pipeline using the classification_report
function on the test subset. How do you interpret your results?
y_pred1 = clf1.predict(X_test)
print(metrics.classification_report(y_test, y_pred1, target_names=data.target_names))
precision recall f1-score support business 0.95 0.96 0.95 92 entertainment 0.98 0.94 0.96 84 politics 0.93 0.92 0.93 77 sport 0.97 0.99 0.98 111 tech 0.96 0.98 0.97 81 accuracy 0.96 445 macro avg 0.96 0.96 0.96 445 weighted avg 0.96 0.96 0.96 445
12. Create your second pipeline with the tfidf representation and a random forest classifier with the addition of an embedded feature selection using the SVM classification method with L1 penalty. Fit the pipeline on your training set and test it with the test set. How does the performance change?
clf2 = Pipeline([
('vectorizer', CountVectorizer()),
('feature_extraction', TfidfTransformer()),
('feature_selection', SelectFromModel(LinearSVC(penalty="l1", dual=False))),
('classification', RandomForestClassifier())
])
clf2.fit(X_train, y_train)
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectFromModel(estimator=LinearSVC(dual=False, penalty='l1'))), ('classification', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectFromModel(estimator=LinearSVC(dual=False, penalty='l1'))), ('classification', RandomForestClassifier())])
CountVectorizer()
TfidfTransformer()
SelectFromModel(estimator=LinearSVC(dual=False, penalty='l1'))
LinearSVC(dual=False, penalty='l1')
LinearSVC(dual=False, penalty='l1')
RandomForestClassifier()
y_pred2 = clf2.predict(X_test)
print(metrics.classification_report(y_test, y_pred2, target_names=data.target_names))
precision recall f1-score support business 0.91 0.93 0.92 92 entertainment 0.96 0.93 0.95 84 politics 0.91 0.91 0.91 77 sport 1.00 0.99 1.00 111 tech 0.94 0.96 0.95 81 accuracy 0.95 445 macro avg 0.95 0.95 0.95 445 weighted avg 0.95 0.95 0.95 445
13. Create your third and forth pipelines with the tfidf representation, a chi2 feature selection (with 20 and 200 features for clf3
and clf4
, respectively), and a random forest classifier.
clf3 = Pipeline([
('vectorizer', CountVectorizer()),
('feature_extraction', TfidfTransformer()),
('feature_selection', SelectKBest(chi2, k=20)),
('classification', RandomForestClassifier())
])
clf3.fit(X_train, y_train)
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectKBest(k=20, score_func=<function chi2 at 0x7fd8444d23b0>)), ('classification', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectKBest(k=20, score_func=<function chi2 at 0x7fd8444d23b0>)), ('classification', RandomForestClassifier())])
CountVectorizer()
TfidfTransformer()
SelectKBest(k=20, score_func=<function chi2 at 0x7fd8444d23b0>)
RandomForestClassifier()
y_pred3 = clf3.predict(X_test)
print(metrics.classification_report(y_test, y_pred3, target_names=data.target_names))
precision recall f1-score support business 0.65 0.46 0.54 92 entertainment 0.80 0.57 0.67 84 politics 0.79 0.73 0.76 77 sport 0.63 0.98 0.76 111 tech 0.88 0.81 0.85 81 accuracy 0.72 445 macro avg 0.75 0.71 0.71 445 weighted avg 0.74 0.72 0.71 445
clf4 = Pipeline([
('vectorizer', CountVectorizer()),
('feature_extraction', TfidfTransformer()),
('feature_selection', SelectKBest(chi2, k=200)),
('classification', RandomForestClassifier())
])
clf4.fit(X_train, y_train)
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectKBest(k=200, score_func=<function chi2 at 0x7fd8444d23b0>)), ('classification', RandomForestClassifier())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectKBest(k=200, score_func=<function chi2 at 0x7fd8444d23b0>)), ('classification', RandomForestClassifier())])
CountVectorizer()
TfidfTransformer()
SelectKBest(k=200, score_func=<function chi2 at 0x7fd8444d23b0>)
RandomForestClassifier()
y_pred4 = clf4.predict(X_test)
print(metrics.classification_report(y_test, y_pred4, target_names=data.target_names))
precision recall f1-score support business 0.86 0.92 0.89 92 entertainment 0.97 0.90 0.94 84 politics 0.92 0.88 0.90 77 sport 0.99 0.98 0.99 111 tech 0.93 0.96 0.95 81 accuracy 0.93 445 macro avg 0.93 0.93 0.93 445 weighted avg 0.94 0.93 0.94 445
14. We can change the learner by simply plugging a different classifier object into our pipeline. Create your fifth pipeline with L1 norm SVM for the feature selection method and naive Bayes for the classifier. Compare your results on the test set with the previous pipelines.
clf5 = Pipeline([
('vectorizer', CountVectorizer()),
('feature_extraction', TfidfTransformer()),
('feature_selection', SelectFromModel(LinearSVC(penalty="l1", dual=False))),
('classification', MultinomialNB(alpha=0.01))
])
clf5.fit(X_train, y_train)
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectFromModel(estimator=LinearSVC(dual=False, penalty='l1'))), ('classification', MultinomialNB(alpha=0.01))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', SelectFromModel(estimator=LinearSVC(dual=False, penalty='l1'))), ('classification', MultinomialNB(alpha=0.01))])
CountVectorizer()
TfidfTransformer()
SelectFromModel(estimator=LinearSVC(dual=False, penalty='l1'))
LinearSVC(dual=False, penalty='l1')
LinearSVC(dual=False, penalty='l1')
MultinomialNB(alpha=0.01)
y_pred5 = clf5.predict(X_test)
print(metrics.classification_report(y_test, y_pred5, target_names=data.target_names))
precision recall f1-score support business 0.96 0.93 0.95 92 entertainment 1.00 0.94 0.97 84 politics 0.95 0.99 0.97 77 sport 1.00 1.00 1.00 111 tech 0.93 0.98 0.95 81 accuracy 0.97 445 macro avg 0.97 0.97 0.97 445 weighted avg 0.97 0.97 0.97 445
15. Dimensionality reduction methods such as PCA and SVD can be used to project the data into a lower dimensional space. If you run PCA with your text data, you might end up with the message:
PCA does not support sparse input. See TruncatedSVD for a possible alternative.
Therefore, we will use the TruncatedSVD
function from the sklearn
package and we want to find out how much of the variance in the BBC data set is explained with different components. For this, first create a tfidf matrix and use that to make a co-occurrence matrix.
tfidf_vect = TfidfVectorizer()
X = tfidf_vect.fit_transform(X_train)
Xc = (X.T * X) # this is co-occurrence matrix in sparse csr format
Xc.setdiag(0) # sometimes you want to fill same word cooccurence to 0
print("Shape of the TFIDF vectorizer:", X.shape)
Shape of the TFIDF vectorizer: (1780, 26739)
print(Xc.todense()) # print out matrix in dense format
[[0. 0.00024418 0. ... 0. 0. 0. ] [0.00024418 0. 0. ... 0. 0. 0. ] [0. 0. 0. ... 0. 0. 0. ] ... [0. 0. 0. ... 0. 0. 0. ] [0. 0. 0. ... 0. 0. 0. ] [0. 0. 0. ... 0. 0. 0. ]]
16. Run the TruncatedSVD
function with different values for components: 1, 2, 4, 5, 10, 15, 20, 50, 100. Plot the explained variance ratio for each component of Truncated SVD.
n_comp = [1, 2, 4, 5, 10, 15, 20, 50, 100] # list containing different values of components
explained = [] # explained variance ratio for each component of Truncated SVD
for x in n_comp:
svd = TruncatedSVD(n_components=x, random_state=321)
svd.fit(Xc)
explained.append(svd.explained_variance_ratio_.sum())
print("Number of components = %r and explained variance = %r"%(x,svd.explained_variance_ratio_.sum()))
plt.plot(n_comp, explained)
plt.xlabel('Number of components')
plt.ylabel("Explained Variance")
plt.title("Plot of Number of components v/s explained variance")
plt.show()
Number of components = 1 and explained variance = 0.8302335701200982 Number of components = 2 and explained variance = 0.916509363220827 Number of components = 4 and explained variance = 0.9291569440571863 Number of components = 5 and explained variance = 0.934486055353985 Number of components = 10 and explained variance = 0.9477460365510288 Number of components = 15 and explained variance = 0.9529731073724189 Number of components = 20 and explained variance = 0.9560375991471989 Number of components = 50 and explained variance = 0.9645814066707994 Number of components = 100 and explained variance = 0.971119162449249
17. How many components are needed to explain at least 95% of the variance?
Based on the selected values, it seems 15 components are needed to explain 95% of the variance.
18. Use these components and train a SVM model on the BBC dataset. Make a pipeline for your model. Compare your results on the test set with the previous pipelines.
clf6 = Pipeline([
('vectorizer', CountVectorizer()),
('feature_extraction', TfidfTransformer()),
('feature_selection', TruncatedSVD(n_components=15, random_state=321)),
('classification', LinearSVC())
])
clf6.fit(X_train, y_train)
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', TruncatedSVD(n_components=15, random_state=321)), ('classification', LinearSVC())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('vectorizer', CountVectorizer()), ('feature_extraction', TfidfTransformer()), ('feature_selection', TruncatedSVD(n_components=15, random_state=321)), ('classification', LinearSVC())])
CountVectorizer()
TfidfTransformer()
TruncatedSVD(n_components=15, random_state=321)
LinearSVC()
y_pred6 = clf6.predict(X_test)
print(metrics.classification_report(y_test, y_pred6, target_names=data.target_names))
precision recall f1-score support business 0.91 0.92 0.92 92 entertainment 1.00 0.94 0.97 84 politics 0.91 0.92 0.92 77 sport 0.99 1.00 1.00 111 tech 0.94 0.96 0.95 81 accuracy 0.95 445 macro avg 0.95 0.95 0.95 445 weighted avg 0.95 0.95 0.95 445