Beautiful Soup To Csv
Solution 1:
The problem is you are referencing symbol('td')
outside of a loop that iterates over your data. What you are essentially doing is this:
scrapeData=[...] # list of the scraped datafor symbol in scrapeData:
print symbol
# symbol is now set to the last item in scrapeData# open file etc..# for row in scrapeData length - do this next action that many times:print symbol('td')[0] # this will print the first element in the symbol, which is the last element in scrapeData - there is no connection to the row at all.
What you need to do is scrape the values within your first loop over scrapeData - put it into a temporary list. Then iterate over the list when writing to the CSV file.
Solution 2:
The table provided under the given URL can be accessed by .find('table', {'class' : 'mdcTable'})
as I stated in my comment above. The first row of the table needs to be ignored since this is the header and does not contain proper data (a little 'smart' implementation extracting the header as column names would be nice, but I haven't implemented something like this) which is done by [1:]
in for row in table.findAll('tr')[1:]
.
To seperate data extraction and csv export I'm storing all extracted data into a dictionary d
which is appended to a list called data
. All the extracted values are formatted in a proper way to have integers, floats and strings as such as well as to remove unneeded new line characters or commas.
After all data is collected, stored in dicts and appended to the data
list, the whole list is written to a csv file using DictWriter
and the appropriate .writeheader()
and .writerows()
methods:
#!/usr/bin/env python3
# coding: utf-8import csv
import urllib.request
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://www.wsj.com/mdc/public/page/2_3021-gainnyse-gainer.html')
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', {'class' : 'mdcTable'})
data = []
for row in table.findAll('tr')[1:]:
col = row.findAll('td')
d = {
'rank' : col[0].getText(),
'issue': col[1].getText(),
'price' : col[2].getText(),
'change' : col[3].getText(),
'per_change' : col[4].getText(),
'volume' : col[5].getText()
}
for key, valin d.items():
val = val.replace('\n', '')
val = val.replace(',', '')
d[key] = valtry:
if key not in ['rank', 'volume']:
d[key] = float(val)
else:
d[key] = int(val)
except Exception as e:
pass
data.append(d)
order = ['rank', 'issue', 'price', 'change', 'per_change', 'volume']
with open('output.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=order)
writer.writeheader()
writer.writerows(data)
Solution 3:
This is how to write in csv file :
import csv
withopen('names.csv', 'w') as csvfile:
fieldnames = ['first_name', 'last_name']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
Post a Comment for "Beautiful Soup To Csv"