Want to combine two <table>
, one with header another with table values, the first table consist with <table>
, <thead>
and no value in <tbody>
with header information only, the second table consist with <table>
, no value in <thead>
and <tbody>
with table value only
HTML code
JavaScript
x
29
29
1
html = """<div style="border: 1px solid #000;">
2
<div style="background-color:#005297;">
3
<table id="CCCCCT" class="BBBBBt" style="width: calc(100% - 16px)">
4
<thead>
5
<tr>
6
<td><span class="AAAAA">DD </span> EE</td><td>FF</td><td>GG</td><td>HH</td><td>II</td>
7
</tr>
8
</thead>
9
<tbody></tbody>
10
</table>
11
</div>
12
<table id="CCCCC" class="BBBBB">
13
<thead>
14
<tr>
15
<td></td><td></td><td></td><td></td><td></td>
16
</tr>
17
</thead>
18
<tbody>
19
<tr class="JJJJJ""><td><div>1111111</div></td><td>M</td><td>4444444</td><td><div>77777<i
20
class="PPPPPP"></i> 10101010101</div></td><td><span class="">13131313131aa</span></td></tr>
21
<tr class="KKKKK"><td><div>2222222</div></td><td>N</td><td>5555555</td><td><div>88888<i
22
class="PPPPPP"></i> 1111111111</div></td><td><span class="QQQQQ">1414141414141aa</span></td>
23
</tr>
24
<tr class="LLLLL"><td><div>3333333</div></td><td>O</td><td>6666666</td><td><div>999999<i
25
class="PPPPPP"></i> 1212121212121</div></td><td><span class="">15151515151aa</span></td></tr>
26
</tbody>
27
</table>
28
</div>"""
29
Python Code
JavaScript
1
10
10
1
from bs4 import BeautifulSoup
2
import pandas as pd
3
import re
4
5
soup = BeautifulSoup(html,'html.parser')
6
7
table = soup.find('div', attrs={'style':re.compile("^border:.*$")})
8
df_list = pd.read_html(str(table))
9
df_list
10
Execution Result
JavaScript
1
8
1
[Empty DataFrame
2
Columns: [DD EE, FF, GG, HH, II]
3
Index: [],
4
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
5
0 1111111 M 4444444 77777 10101010101 13131313131aa
6
1 2222222 N 5555555 88888 1111111111 1414141414141aa
7
2 3333333 O 6666666 999999 1212121212121 15151515151aa]
8
Expected Result (5 columns)
JavaScript
1
5
1
DD EE FF GG HH II
2
0 1111111 M 4444444 77777 10101010101 13131313131aa
3
1 2222222 N 5555555 88888 1111111111 1414141414141aa
4
2 3333333 O 6666666 999999 1212121212121 15151515151aa]
5
Advertisement
Answer
JavaScript
1
7
1
import pandas as pd
2
3
# pd.read_html can read url directly as that's already implemented under the neath
4
df = pd.read_html("URL DIRECTLY")
5
df[1].columns = df[0].columns
6
print(df[1])
7
Output:
JavaScript
1
5
1
DD EE FF GG HH II
2
0 1111111 M 4444444 77777 10101010101 13131313131aa
3
1 2222222 N 5555555 88888 1111111111 1414141414141aa
4
2 3333333 O 6666666 999999 1212121212121 15151515151aa
5
Or applying to your example directly:
JavaScript
1
69
69
1
import pandas as pd
2
3
html = """<div style="border: 1px solid #000;">
4
<div style="background-color:#005297;">
5
<table id="CCCCCT" class="BBBBBt" style="width: calc(100% - 16px)">
6
<thead>
7
<tr>
8
<td><span class="AAAAA">DD </span> EE</td>
9
<td>FF</td>
10
<td>GG</td>
11
<td>HH</td>
12
<td>II</td>
13
</tr>
14
</thead>
15
<tbody></tbody>
16
</table>
17
</div>
18
<table id="CCCCC" class="BBBBB">
19
<thead>
20
<tr>
21
<td></td>
22
<td></td>
23
<td></td>
24
<td></td>
25
<td></td>
26
</tr>
27
</thead>
28
<tbody>
29
<tr class="JJJJJ"">
30
<td>
31
<div>1111111</div>
32
</td>
33
<td>M</td>
34
<td>4444444</td>
35
<td>
36
<div>77777<i class="PPPPPP"></i> 10101010101</div>
37
</td>
38
<td><span class="">13131313131aa</span></td>
39
</tr>
40
<tr class="KKKKK">
41
<td>
42
<div>2222222</div>
43
</td>
44
<td>N</td>
45
<td>5555555</td>
46
<td>
47
<div>88888<i class="PPPPPP"></i> 1111111111</div>
48
</td>
49
<td><span class="QQQQQ">1414141414141aa</span></td>
50
</tr>
51
<tr class="LLLLL">
52
<td>
53
<div>3333333</div>
54
</td>
55
<td>O</td>
56
<td>6666666</td>
57
<td>
58
<div>999999<i class="PPPPPP"></i> 1212121212121</div>
59
</td>
60
<td><span class="">15151515151aa</span></td>
61
</tr>
62
</tbody>
63
</table>"""
64
65
66
df = pd.read_html(html)
67
df[1].columns = df[0].columns
68
print(df[1])
69
Will output the same.
Feel free to use attrs
according to your needs.