Transport Control Protocol..Works on the transport layer.
TCP port numbers…
# Sockets in Python mysock.connect(('www.py4inf.com',80)) mysock.send('GET http://py4inf.com/code/romeo.txt HTTP/1.0\n\n') while True: data = mysock.recv(512) if( len (data) &amp;amp;lt; 1) : break print data mysock.close()
HTTP/1.1 200 OK
Date: Mon, 09 Nov 2015 21:19:27 GMT
Last-Modified: Fri, 07 Aug 2015 16:39:14 GMT
Cache-Control: max-age=604800, public
Access-Control-Allow-Headers: origin, x-requested-with, content-type
But soft what light through yonder window breaks
It is the east and Juliet is the sun
r sun and kill the envious moon
Who is already sick and pale with griefSocket is a low level ayer.
(It keeps the head info)
import urllib fhand = urllib.urlopen('http://py4inf.com/code/romeo.txt') for line in fhand: print line.strip()
Just like opening a file.
Parsing HTML with BeautifulSoup lib
Regx is for parsing HTML. Or, the easy way is to use “Beautiful Soup”.
place the BeautifulSoup.py in the same folder with your other python code.
(I am using version 4.1)
unzip the file, use command to install:
>> Python setup.py install
if you are using pydev in eclipse, you will find it automatically detects the changes.
Following the code:
import urllib from bs4 import BeautifulSoup url = raw_input('Enter - ') html = urllib.urlopen(url).read() soup = BeautifulSoup(html,"html.parser") #for older versions, it should be: soup = BeautifulSoup(html) tags = soup('a') for tag in tags: print tag.get('href',None)
The function is to find all hyperlink tags, and get urls of each.
Enter – http://www.dr-chuck.com/
Out of topic:
This is the 20th post of my blog. I am thinking that I will not be a serious blogger, well, not only talk about techniques.
Registered another module Regression Models, but did not have time to start learning seriously. (Winter makes people lazy…)
Busy with preparing a two-week business trip, like visas (wtf, passport courier fees are killing me) and tickets. Will spend 1 week for my holiday during December and then back to work. Hopefully I will survive the whole winter, with more better blogs.
Thank you all.