Skip to content Skip to sidebar Skip to footer

Beautiful Soup To Csv

There are a few threads on getting beautiful soup data to csv files but I can't find one that makes sense with my code. I am scraping from WSJ biggest gainers. 3 to 103 gives me th

Solution 1:

The problem is you are referencing symbol('td') outside of a loop that iterates over your data. What you are essentially doing is this:

scrapeData=[...] # list of the scraped datafor symbol in scrapeData:
    print symbol

# symbol is now set to the last item in scrapeData# open file etc..# for row in scrapeData length - do this next action that many times:print symbol('td')[0] # this will print the first element in the symbol, which is the last element in scrapeData - there is no connection to the row at all.

What you need to do is scrape the values within your first loop over scrapeData - put it into a temporary list. Then iterate over the list when writing to the CSV file.

Solution 2:

The table provided under the given URL can be accessed by .find('table', {'class' : 'mdcTable'}) as I stated in my comment above. The first row of the table needs to be ignored since this is the header and does not contain proper data (a little 'smart' implementation extracting the header as column names would be nice, but I haven't implemented something like this) which is done by [1:] in for row in table.findAll('tr')[1:].

To seperate data extraction and csv export I'm storing all extracted data into a dictionary d which is appended to a list called data. All the extracted values are formatted in a proper way to have integers, floats and strings as such as well as to remove unneeded new line characters or commas.

After all data is collected, stored in dicts and appended to the data list, the whole list is written to a csv file using DictWriter and the appropriate .writeheader() and .writerows() methods:

#!/usr/bin/env python3
# coding: utf-8import csv
import urllib.request
from bs4 import BeautifulSoup

html = urllib.request.urlopen('http://www.wsj.com/mdc/public/page/2_3021-gainnyse-gainer.html')
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', {'class' : 'mdcTable'})

data = []

for row in table.findAll('tr')[1:]:
    col = row.findAll('td')

    d = {    
            'rank' : col[0].getText(),
            'issue': col[1].getText(),
            'price' : col[2].getText(),
            'change' : col[3].getText(),
            'per_change' : col[4].getText(),
            'volume' : col[5].getText()
        }

    for key, valin d.items():
        val = val.replace('\n', '')
        val = val.replace(',', '')
        d[key] = valtry:
            if key not in  ['rank', 'volume']:
                d[key] = float(val)
            else:
                d[key] = int(val)
        except Exception as e:
            pass

    data.append(d)

order = ['rank', 'issue', 'price', 'change', 'per_change', 'volume']

with open('output.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=order)
    writer.writeheader()
    writer.writerows(data)

Solution 3:

This is how to write in csv file :

import csv

withopen('names.csv', 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader()
writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

Post a Comment for "Beautiful Soup To Csv"