n311

THE INTERNET.

1. Internet vs. websites.

The Internet is a giant store of information and there are some websites on the Internet which specialise in particular sorts of information. So Twitter is for people’s short comments and opinions. YouTube is for videos. Ebay is for notices of things for sale. FaceBook started out as mostly a directory of individual personal profiles (as did the MySpace thing which preceded it). RightMove is for property for sale. TotalJobs is for job vacancies. JustEat is for information about nearby vendors of take-away food. Uber is for offers (and requests) for car and driver transport.

The question I have about all of this is: why does this content need to be on these websites? Why is it not sufficient for it just to be (somewhere) on the Internet? The question in short is: “why do we need websites when we have the internet?”.

Why not just put your content on the Internet, tag it, and let some kind of search engine aggregator find it. If you have an amusing video you wish to share with the world you would then just put that on the Internet and tag it ‘video’ and ‘amusing’ and anything else you want. Then when people are looking for amusing videos they will instruct their search engine to find things which have been tagged accordingly. So no specialist website is needed.

When you put your video on YouTube all that happens is you get some webspace somewhere, ie on the YouTube website, to put your video. Then the YouTube website includes that in its search when you do a search on that website. The YouTube website is therefore just a combination of webspace provider and search engine. But those things are there anyway independently of YouTube. So the alternative to having YouTube would be for a user to get their own webspace and put their video there and let search engines find it as described above.

You might say that the reason people don’t do it like this is because there’s no easy way for them to get webspace for this purpose. But it’s not that difficult. And to the extent that it is difficult the reason for this difficulty is that websites like YouTube do it for you and so they reduce the demand for webspace with which users might do it for themselves.

All specialised websites could be replaced on the same principle. The main point is to mark the content on the webpage so that search engines know what it is. You would probably need an agreed page format so that any search engine could present the results in a comparative format the way specialised websites currently do. So if you had a webpage relating to an item for sale then this would need to be one which contained item description and price and seller name and shipping costs. Where this sort of information was on your webpage in a set format so that a search engine could then lay out the results neatly from different webpages for the search engine user.

In other words putting content anywhere on the Internet and marking it “this is a video (designed to be found by search engines)” replaces putting the content on some website which says “all the content on this website is videos”. Each of these methods results in the same outcome.

The required outcome for the user is a filtering down of information. Websites like Amazon and Ebay are like universal shopping places. So, say I wanted a coat of a particular description. On Amazon I can specify size, colour, fabric and design and other things. And Amazon finds all items that fit my description and puts them all on a single page so I can compare them.

But if people selling coats put their item’s information anywhere on the Internet then, in principle, a search engine could find all of this and filter it the way the Amazon website does. But if a search engine was doing it then it would include all coats for sale, not just the ones on any particular website.

The problem with the filtering process happening on Amazon rather than on the Internet is that Amazon make an enormous amount of money charging sellers commission on sales. They can do this because, with the current set-up, they know that a seller is more likely to get a sale on Amazon than on any website of their own. It’s like Amazon have obtained ownership of that part of the Internet where sales happen. Which is contrary to the general principle that no one owns the Internet.

1a. Tags and Folders

The Mac OS has folders and it also has a function for putting tags (for files).

You can put files in folders. So you can put all files that are about birds into a folder. And when you want to see all files that are about birds, you just go to that folder.

An alternative to folders is using tags on files. So you would tag all the files that are about cats with the tag ‘bird’. And then when you want to see all files that are about birds you use the tag search function in the Finder app and it will list all those files. On the whole using tags is better than folders. Because with tags a file can be tagged with more than one tag. But if you put a file in a folder you can only put it one folder. For example I could tag a file as being about birds. And also as being about things that are native to Australia. If a file was about Australian birds it would get both tags.

So, on the Internet, is it the case that a webpage on a website is like a file in a folder? All webpages are on websites and all websites are “on the Internet”. So (like in the Mac OS) why not just have webpages on the Internet. With tags. Which are then collated, aggregated, by some kind of search engine.

2. Ordinary knowledge.

The Internet makes ordinary knowledge more available. Before now it was easier to find out about some extra-ordinary historical event from a hundred years ago than it was to find out some ordinary knowledge saying what the weather was like in your town a couple of weeks ago. This was because it was never cost-effective to publish in print the latter sort of bulk ephemeral data. But you can publish it electronically on the Internet for everyone to look up. If you’re interested in that sort of thing.

3. What is the Internet?

Do I really understand what the Internet is? Could I explain what it is to someone who was entirely unfamiliar with it.

I have only a vague idea of how the Internet is different from the World Wide Web. It’s something like that the Internet is the structure and the WWW is the content. So this means that there is more to the Internet than the contents of the WWW. I assume this means that you can have content on the Internet but where this content isn’t a website, ie isn’t part of the WWW. But I’m not sure what this would look like. Also: can you have a website which isn’t part of the WWW?

What is the Internet and how should I understand it as a user? I want an understanding of the Internet which corresponds to (and is at the same level as) the understanding that I have of the computer I use. So, I understand my computer to be a device where I can control (give instructions to) applications, such as word processing, calculating, Internet browsing and suchlike. And where I have access to a place (the hard drive memory store) where I can put files (the output of the applications) and access these files again when I want by navigating around folders.

What would a similar account of what the Internet is be like? At the moment I imagine the Internet to involve an increase in the quantity of information that I can access through me being connected to some sort of network of devices. But I don’t have much of an idea of the nature of that network connection.

I understand other ways of increasing the quantity of information that I can access on my computer. For example plugging in an external hard drive. What are the details of the corresponding action that increases the quantity of information I can access when I am connected to the Internet? In other words what exactly is “connecting to the Internet”. It’s something more sophisticated than plugging in an external memory device.

Sometimes the Internet is described as “computers connected to each other”. So that they can communicate with each other. But that’s a different idea. It might be the case that computer A can communicate with computer B without it being the case that computer A can access the information (on some hard drive or whatever) that computer B can access.

When I am networked with another computer what does that mean exactly? When I’m online (“plugged into the Internet”), then I have, in effect access to lots of others information stores. Where this is an immense number of other hard drive storage devices like my own. In other words “http://” (in a browser application) is like “c:\” (in a folder in the Windows operating system). But that can’t be the whole story because, unlike files on my computer, when I access webpages I don’t have the ability to change them. Also that connection doesn’t enable me to use the processor in the other computer. So it’s not that sort of connection.

So the difference is that content on the Internet is largely files like .html accessed largely via browser software. But on my computer content is largely files like .doc accessed largely via software like word-processing.

So: it could have been that the Internet kept the computer way of doing things. So there would have been no browser software. Content on the Internet would be like it was on your computer: files like .doc in folders. You would have accessed those files via a word processor. And searched for things on the Internet like you do on your computer, using CTRL-F or whatever. The other characteristics of files on the Internet could have been introduced within this arrangement. So, if you don’t want the people accessing the .doc files to be able to change them, then just make the files read only.

Conversely: it could have been that your computer had the Internet way of doing things. So on your computer (even off-line) you don’t use .doc or Word. You just use .html and something that lets you edit such files.

Why is it that making a webpage involves getting involved with html code which is the ‘under the bonnet’ type stuff. But if you are making a word processing file (like with MS Word) you don’t. Although you could. (If I open an rtf file using a plain text editor I can see codes which are akin to the kinds of codes you get in html.) There is no webpage editor equivalent of Word. Or maybe there is and I don’t know about it.

These things are converging. But not quite the same yet. Same would be if I could just create a webpage on the Internet. And then write on it as if it was a .doc file. And other people could see the same webpage in a browser window. They could even see me writing on it. It would update in real time. (Or maybe it ought to be that what they see is only updated when I click save on the page that I am writing on.) The closest thing to this that I can find is https://walloftext.co.

Another thing is how does the connection to the Internet work? With my computer hard drive I know I am just permanently connected to it. But the connection to Internet content seems different. For example the hardware is different.

But wait a minute! I’m not sure I’ve made clear what exactly my question is in the foregoing. (Maybe (as with every single other thing on this blogsite) my non-understanding is more apparent than real.) Maybe I don’t know what my question really is. What I do know is that when I try to find out how the Internet works (by searching on the Internet!) I find descriptions of technicalities such as “packet switching” and “top level domain names”. But that isn’t what I want to know. It’s rather like if I ask: how does a washing machine work? how does it clean clothes? And I get an answer which explains how the electrics and mechanics of the machine work. How the drum turns and how the water goes in and out. But this doesn’t really answer the question of how it cleans clothes. Or does it?

3a.

I think my not understanding is rather like the way I don’t understand how virus software gets onto a computer. A computer just does as it is instructed by the user. So if the user never tells the computer to get some virus software and run it then the computer should just never do that! Of course I understand how you acquire a virus if you carelessly download and run some software from an untrusted source without checking what that software actually is. But apparently you can get a virus just from visiting a webpage on the Internet.

Microsoft HERE say that: “Malware can use known software vulnerabilities to infect your PC. A vulnerability is like a hole in your software that can give malware access to your PC. When you go to a website, it can try to use those vulnerabilities to infect your PC with malware.” But this isn’t saying anything really. It’s just saying that virus software can get through a hole. Which is rather vague and also just obviously true!

If clicking on a link on a webpage can put some virus software on my computer then I’m amazed that there aren’t more viruses going around. Anyone could set up a webpage and put a dangerous link in it. I guess the hard part would be getting people to visit that page.

4. Indexing.

If general, if you have a lot of information then you need some method of getting to the particular bit of information that you need. So, suppose I had a thousand books in my library. But they were all identical looking from the outside, all in plain grey binding. And they were in a large room on many shelves but not in any particular order. And I know that one of them is about the Geography of Australia but I don’t know where it is! Which means I can’t easily get to it and so I might as well not have it. In general it’s only true that I’ve got something if I have some reliable way of getting to it.

If I have got the books on shelves then it’s possible that I could know where they all are by just remembering whereabouts they are. If I have got too many books to remember where they all are then I could put them in alphabetical order by subject along the shelves. So books on History would come before books on Geography. And within the block of books about Geography the book “Geography of Australia” would come before the book “Geography of Brazil”.

On the other hand I could just not bother with the effort of putting the books together and I could create an index list instead. So I would leave the books on the shelves in any state of disorder, not ordered by subject. Then I write out a list of the books and put this list in alphabetical order by subject. So I’m putting the list in order instead of doing that with the books themselves. So on the list the books about Geography would come after the books on History. This list is an index.

Because it’s the index that’s in order by the subject rather than the books this means there is an additional requirement that the index must tell me where the book is. So on the index where it lists the book “Geography of Australia” the index will also say where the book is: which shelf and which position on that shelf.

The good thing about having the written index ordered rather than the books themselves is that then a book can be listed more than once. If there is a book about the birds of Australia then this can be listed on the index under the Birds heading and then also under the Australia heading. This kind of thing would not have been possible if I had ordered the books instead of the list.

A further thing is that if the index is a computer text file rather than being written on paper then it does not need to be in any order. Because if I wanted to find the book about the birds of Australia I would just use ctrl F (or cmd F) to find it on the index page that way. The ctrl-F replaces the need for the index list to be ordered.

Either way each entry would need to say what subject heading (or headings) the book falls under. The author of the book on Australian birds might have decided to call their book “The Antipodean Aviary” which doesn’t mention either of the words “bird” or “Australia”. So the index needs to somehow link that title to the headings “Australia” and “Birds”. If the index was printed and in order by subject then these words would be in the headings. So the heading “Birds” would come after the heading “Australia” in the index. And under each of these two headings would appear the name of the book “The Antipodean Aviary”. It would be listed twice.

If the index was an unordered computer file then the book could be listed once and the entry would maybe look something like: “The Antipodean Aviary (Birds) (Australia)”.

Suppose I had a load of books about different animals. And the written index was like this: “One: cats, Two: dogs, Three: horses, Four: goats, Five: rabbits” and so on up to “One hundred: ants”. That would be no good. And if this got put into alphabetical order like this (for example, just the first twelve): “Eight, Eleven, Five, Four, Nine, One, Seven, Six, Three, Ten, Twelve, Two” again that would be no good.

I remember once at an office where I used to work there were a lot of rarely used things lying around on desks and on the floor and and in cupboards, everywhere! And we were always wasting time finding them when we needed them. So I put them all in boxes and I numbered the boxes and wrote down in a computer file what was in each box. Then, instead of rummaging through the boxes, you could just do “ctrl F” on the computer. The important point here was that I didn’t make any attempt to put things of a certain sort together. (For example, I didn’t put all stationery items into one box.)

On the Internet when you search on Google you are not searching the Internet. Rather you are searching an index that Google has created. On the index the entry for each indexed item has a hyperlink to tell you where the content is. I don’t know how the indexing is done. Google’s software goes through all the pages on the Internet and then it decides what subject headings to apply to each page. But how does it do that? It can’t be just based on word mentions. Suppose Google gets a page which mentions cats a lot. It can’t simply index it under ‘cats’ because that page might be a review of T.S. Eliot’s Book of Practical Cats. Which isn’t really about cats at all.

I’m old enough to remember the beginnings of the Internet and it never occurred to me that the central pillar of it would be a search engine (namely, Google). And at the beginning I don’t think it was. I assumed there would be a giant index or catalogue. And it turns out that that is what lies behind the search engine. But it’s not transparent. What if the index that Google created was more open. So individuals could decide where on the index their website was. Actually I think that is what happens. When you create a website you can put certain tags telling Google how to index your website.

See section 1a above about folders and tags.

5. Messaging.

The other thing that happens on the Internet is messaging. The basic form of which is email. The question I had in section 3. above applies to this as well. How does it work?

When somebody sends an email what happens? Suppose Jack sends Mary an email. She gets it when she goes online. Is it somewhere in between? If (as above) I try to understand it in terms of what I already know, namely my experience of using a computer. Then it might be something like that there is a folder which only Mary can access. And then somehow Jack connects to Mary’s computer and puts a message in there. He’s allowed to do that but he can’t actually access the folder. Because then he would be able to see all the messages that other people had sent to Mary.

6. Website vs Messaging.

One way of thinking about it is that The Internet is these two things: (a) you go and get content off a webpage via a browser (or some other app) or (b) you get it sent to you, for example by email.

What if there was only one of these things. If there were just email and no websites. Then information that is normally on a website would be emailed to you and you would store it in a file on your computer. Any changes would be emailed to you later too. Conversely if there were no email but just websites. Then, instead of people emailing you they would put a message on your webpage that only you had reading access to.

But putting information on the internet is not the same as informing people of that information. Unless you are going to assume that they are frequently checking that website for updates.

If the information you want is quite static then websites are better. For example if you wanted the complete novels of Charles Dickens then it’s best to just put them on a website. But what if you wanted to know what new things people were saying about the novels of Charles Dickens. If this was on a website you would need to check it frequently. And this might mean a lot of wasted time and effort as there might not be something on there every time you checked. So it would be better if you had this emailed to you.

Suppose you wanted to know about anything new on a hundred different webpages. Then you would have to check them all regularly. In which case you’d rather just tell the webpage owners to email you with any changes.

The combination solution is what is known as a feed. This is where some software automatically checks many different websites for updates and aggregates the results onto one page for you to access. This makes things easier because then you only have to check one page instead of a hundred. This is rather like having a little personal search engine. Google Alerts does something similar to RSS. But all of this seems rather clunky. RSS only works if the website is configured accordingly. I’m surprised there isn’t something more streamlined.

But there is still a distinction between a stream of ongoing information on the one hand. And on the other hand a store of all the (accumulated) information. It’s like the difference between the content of newspapers and the content of a history book.

Also this idea of a feed seems underused. There doesn’t seem to be any efficient way for me to kept up to date with some very precise random collection of things that I might have an interest in.

Here’s a question I asked on Reddit once:

Is there such a thing as an application that monitors the internet in order to give me some kind of generalised abstract feed. So I can keep up to date with the latest news on a disparate collection of things. For example I might want to know about the following random things: the price of a certain product at my local store, new books written by a particular author, changes to the bus timetable for a service I use, research into a medical condition that I have, new construction works in my area, whether or not it’s going to rain tomorrow. All this information will be on the internet somewhere but scattered around and so not immediately available to me.

It would be good if I had one application that aggregated all the relevant news and kept me informed about all these things. Maybe there is one and I’ve missed it! If there isn’t then there should be something like it. There’s no point information being on the internet if I never find out about the bits that would interest me. Or if the only way this can happen is for me to constantly monitor lots of different websites myself. Which is time-consuming and inefficient.

Some of the websites which contain the information I want will have an alert facility where I can give them my email address and they will tell me any news. But, again, this is a rather inefficient way of doing things.

Even more simply: suppose I just want to be told if it is going to rain tomorrow. I don’t want to have to got to a website and manually consult a webpage every day showing a forecast. How do I do that? You can probably get an app that. But then are you supposed to get a separate app for each of the things you want to know?

Part of the problem is that there seems to be no standardised language that internet information is in. So there will be a store website with data about prices of items. But that isn’t, so to speak, machine readable. But, given AI advancements, there should be an app that I can instruct: “every day at 10:00 go onto the website store.com, find their listing for standard widget and if the price ever falls below £25 then notify me”.

7. Website design.

Webpages on a website are in a structure. Rather like a book has chapters, sections, subsections. On a well organised website this structure will be quite clear.

In such a way that whatever webpage you are on on a website, you will know where that page is in the website structure. Rather like knowing where you are on a map. This can be done by information that’s actually on that website.

Consider this URL: “website.com/book-three/part-one/chapter-six”. Here the webpage structure of the website is being used to let the user know where they are. But this isn’t necessary. You could have a website where the URL for all pages was of the format “website.com/[some random number]”. I think it’s better if this kind of thing is stated on the page itself.

In other words every page should have a link back to the home page. Or some link that will lead you to the homepage even if it’s via other pages. Sometimes you land on a webpage you got to from a search engine. But there’s nothing on the page to say whereabouts on the website it is. The content of the page might refer to some other previous page because there is a link from that page to the one you are on. But there is no link from the page you are on to that other page. Maybe the person who set up the website just assumed you would use the browser back button to get to the previous page. But if you try doing that after having arrived there from a search you just end up back on the search results page!

Conversely you shouldn’t have a webpage on website that isn’t mentioned and linked to from either the homepage (or some other page that is mentioned to and linked from the homepage). That would be like having a chapter in a book which wasn’t listed in the list of chapters that was at the beginning of the book. Like a secret chapter!

8. As a sort of conclusion and about “what is the Internet?”.

At first thought I think it’s just a large store of information. Just an online library. But it’s more that although I can’t say exactly how. But it will be something to do with the other side of things, ie messaging. As described in section 6 above. So, it’s something like that the content which the Internet is composed of is constantly being altered by the users of the Internet. And this changing is the messaging.

For example reviews of a product. Or a Twitter conversation.

I think to myself: When somebody publishes a book, what if I thought of that as a message to me. Which I have to collect at a bookshop by paying a fee.

9. Other things.

With the Internet all information is available in a way it wasn’t before. But maybe we haven’t got used to this yet. Once someone sent me a message mentioning how they had written some blog posts. But they didn’t send me links to those posts. And I thought: this is like if I was sat next to them and they told me that they had bought a new wristwatch. But they didn’t show it to me. Even though they were wearing it at the time. That would be odd.

I got a website of my own. But it’s not very easy to use. I assumed that having a website would mean that I can write in the browser application as well as read. When I say “write in a browser” then I mean just like I can write in the Word application.