Remove repeating URLs from Google search reults

Discussion in 'Tech Discussion' started by xsliege, Jan 24, 2024.

  1. xsliege

    xsliege Well-Known Member

    Joined:
    Jul 15, 2017
    Messages:
    31
    Likes Received:
    10
    Reading List:
    Link
    Hi, I need to search many sites (remembering them is not and option) for information and i need to filter out repeating sites from search results, so i don't have to double check already visited sites.

    For example i get search results:
    www.xyz.com
    www.example.com
    www.xyz.com/something
    www.example.com/other

    How do i automatically filter out subpages like www.xyz.com/something and www.example.com/other so only ONE url for each website remains in search results?
    There are sites that have many subpages (especially sub pages that are made for SEO purposes like for example www.xyz.com/location/EU/Spain, www.xyz.com/location/EU/Portugal etc.) and they muddy search results

    Is there some regular expression that can filter out repeating urls or something? Or am i missing something obvious?
     
  2. gangbuntu

    gangbuntu Well-Known Member

    Joined:
    Jul 9, 2016
    Messages:
    315
    Likes Received:
    251
    Reading List:
    Link
    can't help u there; i've given up on google (i know that u never mentioned it once) search capability for over a decade now. but, ciiw, i'll try to break down the requirements:
    • redundant results from web-host should be filtered out; especially those SEO cheats.
    • what about localized variants from the same host (e.g: en.wiki, es.wiki, fr.wiki) ?
    • what about cash-grab/malicious sites that contains jumbles of tags &/ incoherent phrases?
    • what about aggregate/wiki sites (e.g: i'd expect multi hits if i search `isekai` in this site)?

    quite frankly i don't think a simple regex can help, since efforts to judge content similarity would be required; and different people might have different opinion on what constitute similar/duplicate.
    one of these might be closer to what u want to achieve though: https://github.com/topics/metasearch-engine
     
    xiazixin likes this.
  3. xiazixin

    xiazixin Well-Known Member

    Joined:
    Dec 7, 2017
    Messages:
    1,409
    Likes Received:
    674
    Reading List:
    Link
    upload_2024-1-25_9-11-59.png
    upload_2024-1-25_9-12-15.png

    upload_2024-1-25_9-14-9.png

    upload_2024-1-25_9-14-55.png

    Hope this will make your life easier.

    Although, dark sites will still be in the dark.
    upload_2024-1-25_9-33-16.png

    there are other functions like
    upload_2024-1-25_9-28-39.png
     
    TXHY likes this.
  4. Nom de Plume

    Nom de Plume [Shio’s Disciple] [True Villain] [Equip: Gunblade] Novel Updates Staff

    Joined:
    Oct 20, 2015
    Messages:
    2,693
    Likes Received:
    13,033
    Reading List:
    Link
    These are the ones I use most commonly, but yeah I don’t know ones that would restrict searches to condense all subdomains to single results
    Sounds like you have a story here (y)
     
    xiazixin likes this.
  5. gangbuntu

    gangbuntu Well-Known Member

    Joined:
    Jul 9, 2016
    Messages:
    315
    Likes Received:
    251
    Reading List:
    Link
    do i expect to get what i want (even technical questions) in the 1st page? no.

    as i surf the web, there'd be click-bait ads with sensational titles n exotic images.
    i observed google (image-based) search evolution when trying to find what those exotic images really are:
    1. completely unrelated images sharing the same color tones.
    2. random pages containing the ads. i considered myself lucky if i could find the 1st clue within 20th page (over 200 duds to skim through)
    3. won't start searching unless we provide some tags. i gave up once it got to this point.

    ex: (photoshoped) kelimutu lake, house perched on the edge of a cliff

    for years google search results have been skewed based on search history / other stats. also adhering to local government wish for censorship.
    even in the incognito mode, the search history/patterns seemed to be retained (between days of restart)
    the main reason i go incognito is not to hide my history, but to get unbiased result.

    iirc it was `i'm really a superstar` that dropped an obscure term. someone mentioned that he tried to google that term, it was a hentai. when i tried to search it was an old man. both didn't fit the context so i tried duckduckgo: it was a (chinese?) game personifies anime (k-on in that particular case) character with an aircraft.

    it gets even worse as of late with:
    - it prefers my locale in incognito mode
    - annoying auto-complete (to the point that it prevented me from typing) i had to type in the notepad n copy-paste the whole term to the search box
    - "i think you mean xyz" and gave the result for xyz; no (for the most part) i don't. i really meant xwz.

    i know that searching is NOT a trivial process, especially as the data grow exponentially (despite the fact that the vast majority are simply echo chambers).
    but google and youtube are not the place to learn / gain knowledge/understanding.
    they are the tools/apps to reinforce what people already believe in.

    heck, even stackoverflow answer with more than 2000 votes could be wrong.
    the prevalent of herd mentality is a very concerning
    i find solace from some people that still have the decency to provide reference(s) for their statements.

    sadly, many people that has given up on google turned to chatgpt.
    they don't care about right / wrong anymore, they just need a fast convincing answer.
     
    TXHY and Nom de Plume like this.
  6. xiazixin

    xiazixin Well-Known Member

    Joined:
    Dec 7, 2017
    Messages:
    1,409
    Likes Received:
    674
    Reading List:
    Link
    chatgtp can't answer any technical questions. Currently I'm having issues with vlan settings with my isp and chat gtp just won't work. Google search results will give you some random forum post or some random router instructions manual which also dose not work.
    Usually, some times a question just that difficult to get answered.

    For example if you ask a technical question like, do **** router support openwrt. chat gpt will ask you to check compatibility, and ask on forums to check if its compatible.
     
    gangbuntu likes this.
  7. gangbuntu

    gangbuntu Well-Known Member

    Joined:
    Jul 9, 2016
    Messages:
    315
    Likes Received:
    251
    Reading List:
    Link
    u know that, i know that, but what about the others?

    there had been 4 candidates submitting the same mistakes in an offline mathematical interview question. out of curiosity i asked the 4th candidate: "how can u ensure that ur question is correct?"
    his answer: "it came from chatgpt, therefore it is correct".
    * the 2nd and 3rd failed to reach interview stage; the 1st willing to admit (that he was wrong) when problems were pointed out.

    years ago there was this term: `programming is stackoverflow`. i've seen a shift into `programming is chatgpt`: some junior n mid developers rely on chatgpt for coding.
     
    xiazixin likes this.
  8. xiazixin

    xiazixin Well-Known Member

    Joined:
    Dec 7, 2017
    Messages:
    1,409
    Likes Received:
    674
    Reading List:
    Link
    It's normal, there are plenty of mistakes, the most important is to try. There are plenty of mistakes. Back in the days I learned my programing, I use the library, physical library. Many of the books are out of date, some are incorrect. I basically use a pencil cancer the incorrect part on the book and write incorrect.

    There were some edits by other readers too. Though we don't call those vandalism.