Questions tagged [web-scraping]

Web scraping is the process of extracting specific information from websites that do not readily provide an API or other methods of automated data retrieval. Questions about "How To Get Started With Scraping" (e.g. with Excel VBA) should be *thoroughly researched* as numerous functional code samples are available. Web scraping methods include 3rd-party applications, development of custom software, or even manual data collection in a standardized way.

-2
votes
1answer
17 views

Web-scraping with angular

I am new to angular and I am trying to make a web scraping app. The problem is I do not know how to fetch the entire html content of a specific web page with httpClient (then I want to use regex to ...
0
votes
0answers
16 views

Unable to make my script stop when some urls are scraped

I'v created a script in scrapy to parse the titles of different sites listed in start_urls. The script is doing it's job flawlessly. What I wish to do now is let my script stop after two of the urls ...
0
votes
0answers
14 views

.wait() nightmare failed to find an ID

I'm trying to get the reviews in aliexpress but for some reason wait() function always fail to find #transction-feedback So technically if you go to that link and click Feedback tap it will show all ...
1
vote
1answer
32 views

I want to extract links of members

I am trying to extract links of the following members from bs4 import BeautifulSoup import requests r = requests.get('https://www.aapkiawaz.in/about/doctor-hospital-directory-medical-directory-...
0
votes
0answers
22 views

Trying to scrape and segregated into Headings and Contents. The problem is that both have same class and tags, How to segregate?

I am trying to web scrape http://www.intermediary.natwest.com/intermediary-solutions/lending-criteria.html segregating into 2 parts Heading and Content, The problem is that both have same class and ...
0
votes
0answers
21 views

Selecting specific child of a table class

I am scraping a website and there is a table of values held in a table class, I am trying to get the third row of the table but I'm not sure how to specifically select that one and not the others.
0
votes
0answers
18 views

while web scraping it will not scrape all content in each link

I am trying to scrape each job description is glassdoor, https://www.glassdoor.ca/Job/new-york-state-data-scientist-jobs-SRCH_IL.0,14_IS428_KO15,29.htm within a page there are lot of job postings and ...
0
votes
2answers
21 views

Trying to extract data using python/scrapy and not able to find the correct xpath

I wanted to scrape the website. http://resolved-error.com/jobs?med=site-ui&ref=jobs-tab I want to extract Title Location Company of the job postings. I tried few xpath's for the location,...
0
votes
3answers
50 views

How to extract data from regex expression in this instance?

I am attempting to scrape ridernames from this URL Currently I'm struggling with my regex expression, as it works fine in matching the content (regex101), however I'm unsure how I take a matching ...
0
votes
1answer
34 views

How scrape a website in which i post information

I want to scrape announcements information from the https://nseindia.com/corporates/corporateHome.html?id=allAnnouncements. Specifically i want to goto Corporate information tab on left hand side of ...
1
vote
0answers
20 views

Using ProcessPoolExecutor for Web Scraping: How to get data back to queue and results?

I have written a program to crawl a single website and scrape certain data. I would like to speed up its execution by using ProcessingPoolExecutor. However, I am having trouble understanding how I can ...
1
vote
1answer
28 views

BeautifulSoup can't find div with specific class

So for some background I have been trying to learn web scraping to get some images for machine learning projects involving CNNs. I have been trying to scrape some images from a site (HTML code on the ...
1
vote
2answers
35 views

Using selenium to find indexed element within a div

I'm scraping the front-end of a webpage and having difficulty getting the HMTL text of a div within a div. Basically, I'm simulating clicks - one for each event listed on the page. From there, I ...
0
votes
2answers
32 views

Web scraping using Selenium issue with same class name

Im extracting some soccer data from a simple dynamic table using Selenium, but the problem is, when im trying to get the "text-center" class name, it shows a lot of extra data that i dont want. I've ...
0
votes
2answers
22 views

How to download a page with lazy loading?

I need to download full page and parse it, but it creates some elements with help JavaScript. When i try to do this with help urllib i receive an html page without elements using JavaScript. How can I ...
0
votes
1answer
8 views

How to scrape “raw” text-nodes between named elements with Cheerio

Cheerio does not like html without proper tags (who does, really?). I'm trying to scrape some menus and the content i want is between elements in the html. Is there a way to parse each of these and ...
1
vote
1answer
28 views

Read list of links with beautiful soup

I've been trying to read the links from a list of URLs I successfully extracted. My problem is that I get a TypeError Traceback (most recent call last) when I try to read the whole list. However, when ...
-2
votes
0answers
9 views

Scraping Web Table Data To Assemble The Data In Unique Table Data Structure

I want to do scraping web table data to assemble the data in unique table data structure in php. See the picture below for understand the question. How can I do it ? please help me https://i.stack....
0
votes
0answers
16 views

Scrape an aspx site using scrapy

I am trying to scrape the site pasted below and iterate through each page, retrieving both the job titles and the dates that each job was posted. I cannot seem to scrape more than the first page. The ...
0
votes
2answers
30 views

Python Beautifulsoup webscraping script

I'm new in python. Just started yesterday. I want to scrape information from this website https://www.letudiant.fr/educpros/responsables-enseignement-superieur/critere-Responsable.html. I want to ...
0
votes
1answer
13 views

Python Scrapy - Scraped from url are not the ones set in start_urls

i'm new using scrapy and I have a doubt about the urls that are scraped. I'm trying to scrape a site that every page that you go redirects to the homepage, when you click in a banner you can acess ...
2
votes
3answers
46 views

Selenium can not scrape Shopee e-commerce site using python

I am not able to pull the price of products on Shopee (a e-commercial site). I have taken a look at the problem solved by @dmitrybelyakov (link: Scraping AJAX e-commerce site using python) . That ...
0
votes
1answer
30 views

Web scraping with jsoup is returning only part of the table

I am new at coding. I'm trying to webscrap a table with a list of funds from a broker's website. The code is working fine but ut is returning only part of the list (a bit more then the first half of ...
0
votes
1answer
50 views

Can't scrape <ui-tags> in python. Unsure why?

I am trying to scrape AFL odds from betfair (https://www.betfair.com.au/exchange/plus/australian-rules). I am fairly new to webscraping however have managed to scrape odds from other bookies but i am ...
1
vote
1answer
37 views

Get value from an iframe with beautifulsoup

I try to get the temparature value out of this website with beautifulsoup. But when I print out the whole text of the soup it only shows me an iframe: <iframe frameborder="0" height="100%" src="...
-11
votes
0answers
37 views

how to trim a link in python [on hold]

how I can trim this link in python 3.xx ['/url?q=http://www.iitk.ac.in/esc101/share/downloads/javanotes5.pdf&sa=U&ved=0ahUKEwi7xJWf_-DhAhUB148KHdKYDmIQFghCMAk&usg=...
-2
votes
0answers
17 views

Web scraping in asp.net/wcf? [on hold]

I want a website app to do web scraping. Currently, I have a windows app that works like a charm with CefSharp, but my clients are asking me to have the application on the web. I tried to use ...
0
votes
2answers
42 views

Why does HTML source from Selenium look different than that shown in a web browser’s view?

I am using Python and Selenium to capture the HTML source of a webpage so I can parse it to find a particular element. The source, however, is not the same as what I get when using the “inspect ...
-1
votes
1answer
40 views

I'm trying to scrape a website for some list items, but beautiful soup does not find any on the page

I'm attempting to make a table where I collect all the works of each composer from this page and arrange them by adding "score" e.g. 1 point for 300th place, 290 points for 10th place, etc. using a ...
0
votes
0answers
30 views

Webscraping in R with rvest - download xls

I'm trying to collect the data from this site here, but my experience and what I read in other pages and posts are not enough. I'm trying, using R, to select the "download" option, in addition to the ...
0
votes
1answer
44 views

Including text with <strong> and <em> tags when scraping html using lxml & requests?

I'm scraping text from a webpage using lxml and requests. All of the text that I want is under <p> tags. When I use contents = tree.xpath('//*[@id="storytext"]/p/text()'), contents only includes ...
1
vote
1answer
33 views

Issue with recreating XMLHttpRequest via Python Requests and Postman

I'm trying to shorten links on bit.do using Python or Postman. In Chrome everything works fine. But not with Python/Postman. I get the page, but there is only error, however request from Chrome and ...
0
votes
1answer
38 views

how to get values from nested tables using beautifulsoup

I need to get the name and the price of each row in the sample html below, however when I'm using beatifulsoup to find_all('tr') it returns all the tr of the main table and the nested tables. what is ...
0
votes
0answers
11 views

Call JavaScript PostBack via another Script while parsing Page

I am writing a plugin for google chrome which goes and extracts some data for a webpage and saves it to a local db. I have covered all the parsing of the pages but some info is on a different tab and ...
-2
votes
0answers
19 views

How to print a specific element in a web page, which changes over time in python3

I have just started learning python3. I want to find how could I print a changing element in a website in python. Eg: Like printing time from a website whenever we run the program.
0
votes
2answers
31 views

How do I make Internet Explorer driver invisible using Selenium and VB?

I am using Selenium WebDriver to make some automations, using chrome I can use the headless argument to hide it, but I don't know the argument to hide the Internet Explorer. Dim driver As New ...
0
votes
1answer
20 views

How to get stat (item_scraped_count) using Scrapy?

I want to get the total count number of scraped items but I am always getting error from scrapy.stats import stats class MySpider(Spider): name ="myspider" start_urls = ["http://example.com"] ...
0
votes
0answers
13 views

How many data packet handling units (for web requests) a laptop has & do each core have its own such unit?

I want to know how many data packet handling units a multicore CPU has for web requests. I mean if I am using multiprocessing to request different websites then are those websites called ...
1
vote
2answers
44 views

Scraping with specific criteria when similar classes used in html source

I am trying to scrape the 8 instances of x between td tags on the following <th class="first"> Temperature </th> <td> x </td> # repeated for 8 lines There are however ...
-1
votes
1answer
26 views

reading web content returns JS in disabled

I wrote the following code which reads a webpage contents: string url ="https://hackerone.com/directory?asset_type=URL&order_direction=DESC&order_field=started_accepting_at"; HttpClient ...
-1
votes
1answer
18 views

Is there a well known platform to make an Android app with the result of a python web-scraping ? and what would be the best way to do so?

I am able to easily scrap the content of webpages that gives the nearby events in a town with python. And I would like to use the result to make an android app out of it. So I was wandering what ...
-1
votes
2answers
47 views

Is there any reason why my if statement for finding text in a bs4 tag element fails?

I am trying to find and print all the h3 tags which contain the months i am interested in. To do this i tried to make a for loop of my bs4 object(head) and an if statement within it specify to print ...
1
vote
1answer
37 views

Google maps place id using selenium

from selenium import webdriver import re driver= webdriver.Chrome(executable_path=r"C:\Users\chromedriver") sentence ="chiropractor in maryland" url="https://google.com/search?hl=en&q={}".format(...
1
vote
1answer
42 views

How to save all image from the page using beautiful soup?

I'm trying to get all the image from the website and save it in my local using beautiful soup . I'm able to get only image available in the page not able to parse the image available after page ...
0
votes
1answer
23 views

How to split an XML API response between Dict and List

I have the following XML response: <?xml version="1.0" encoding="utf-8"?> <export_response xmlns:xsd="" xmlns:xsi="" xmlns=""> <success>true</success> <row_count>2&...
-2
votes
3answers
51 views

Is there another way besides “strip()” and “replace()” to get rid of the extra white space in the data I scraped?

I am pretty new to python and I am trying to set up a webscraper that gathers data on characters who have died in the show Game of Thrones. I have gotten the data that I want but I can't seem to get ...
1
vote
1answer
34 views

How to pull attribute details from btnclass webscraping

I'm currently trying to pull specific information from within a BTnClass element on a webpage. the specific button and preceding element details are: <div class="m-t-sm"> <button class="...
0
votes
0answers
30 views

Is there an way to get real-time data with httpwebrequest from a javascript?

I'm trying to get data with httpwebrequest its ok with stable datas which is not changing every second. But when i'm trying to get a timer(which actually changes everysecond as u know) I'm not ...
0
votes
1answer
22 views

How do I authenticate using MSXML2.XMLHTTP and VBA?

I need to authenticate on the endpoint https://graph.microsoft.com/v1.0/me/drive/root/children using MSXML2.XMLHTTP and VBA. I have the access token already but I am struggling to find out the string ...
0
votes
2answers
27 views

How to tell python to look for an element only if it exists?

I want to scrap information from supermarket products but taking into account that some of the info (the origin of the product) isn't always available. I am trying to iterate over a dataframe of ...

http://mssss.yulina-kosm.ru