0% found this document useful (0 votes)
25 views8 pages

Understanding Information Representation and Markup

Uploaded by

kumardasam333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Understanding Information Representation and Markup

Uploaded by

kumardasam333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MAD – 1

WEEK 2
LEC 1: Information representation in a machine
● Markup: Primary part of the UI and how the user interacts with the computer.
Deals with the aesthetics of the site
● Information Representation:
○ Computers work only with “bits”
■ Binary digits: 0 and 1
○ Numbers
■ Place value: binary numbers: eg: 6 = 0110
■ Two’s complement: negative numbers. eg: -6 = 1010
○ Letters? Arbitrary Text?
● Representing Text:
○ Information Interchange:
■ Communicate through machines - either between machines or
between humans
■ Machines only work with bits
■ Standard “encoding”
● Some sequence of bits interpreted as a character
○ Interpretation:
■ What is “0100 0001”?
● String of bits
● Number with value 65 decimal
● Character “A”
■ Matter of Interpretation and Context
● ASCII:
○ American Standard Code for Information Interchange
○ 7-bits: 128 different entities
■ ‘a’ .. ‘z’
■ ‘A’ .. ‘Z’
■ ‘0’ .. ‘9’
■ Special characters: !@#$%^&*()..
○ Why 7-bits?
■ Total number of characters to be represented came around
60-100 => 7 bits needed
○ What about other language characters?
■ 1000s of characters needed
■ ASCII fails
● Unicode:
○ Allow codes for more scripts, characters
○ How many?
■ All living languages? All extinct languages? All future
languages?
○ “Universal Character Set” encoding - UCS
■ UCS-2: 2 bytes per character - max 65,536 characters
■ UCS-4: 4 bytes per character - 4 Billion + characters

LEC 2: Efficiency of Encoding


● Efficiency
○ Most common language on the Web?
■ English obv.
○ Should all characters be represented with the same number of bits?
○ Example:
■ Text document with 1000 words, approx 5000 characters
(including spaces)
● UCS-4 encoding: 32b x 5000 = 160,000 bits
● ASCII encoding 8b x 5000 = 40,000 bits
● Original 7-bit ASCII for English: 7b x 5000 = 35,000 bits
● Minimum needed to encode just ‘a’ - ‘z’, numbers and
some special characters: could fit in 6 bits: 30,000 bits
● Optimal coding based on frequency of occurrence:
○ ‘e’ is the most common letter, ‘t’, ‘a’, ‘o’, …
○ Huffman or similar encoding: ~ 10-20,000 bits,
possible less
● Solvable in general?
○ Impossible just to encode by actual character frequency: depends on
text
■ Just use compression methods like “zip” instead
○ But can encoding be a good halfway point?

Example:
● Use 1 byte for most common alphabets
● Group others according to frequency, have “prefix” codes to indicate

● Prefix coding
● Example

● UTF-8
○ Use 8 bits for most common characters: ASCII subset
■ All ASCII documents are automatically UTF-8 compatible
○ All other characters can be encoded based on prefix string
○ More difficult for text processor:
■ First check prefix
■ Linked list through chain of prefixes possible
■ Still more efficient for majority of documents
○ Most common encoding in use today

LEC 3: What is Markup?


● Content:
○ Markup is a way of using cues or codes in the regular flow of text to
indicate how text should be displayed.
○ Very useful to make the display of text clear and easy to understand

● Markup:
● Types of Markup:
○ Paper that introduced Markup and its types:- Coombs et al,
“Communication Systems and the Future of Scholarly Text
Processing”, Communications of ACM, 1987
■ Presentational
● WYSIWYG: directly format output and display
● Embed codes not part of regular text, specific to the
editor
■ Procedural
● Details on how to display
○ change font to large, bold
○ Skip 2 lines, indent 4 columns
■ Descriptive
● This is a <title>, this is a <heading>, this is a
<paragraph>
● Very effective
● Examples:
○ MS Word, Google Docs etc:
■ User interface focused on “appearance”, not meaning
■ WYSIWYG: direct control over styling
■ Often leads to complex formatting and loss of inherent meaning
○ LaTeX, HTML (general *ML)
■ Focus on meaning
■ More complex to write and edit, not WYSIWYG in general
● Semantic Markup
○ Content vs Presentation
○ Semantics
■ Meaning of text
■ structure or logic of the document

LEC 4: Introduction to HTML


● HyperText Markup Language
○ HTML first used by Tim Berners-Lee in original Web at CERN (~1989)
○ Considered an application of SGML (Standard Generalized Markup
Language)
■ Strict definitions on structure, syntax, validity
○ HTML meant for browser interpretation
■ Very forgiving, loose validity checks
■ Best effort to display
● Example

<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

</body>
</html>

● Tags
○ <h1></h1> - paired tags
○ Angle brackets < >
○ Closing tag with /
○ Location specific: <DOCTYPE>: only at head of doc
○ Case-insensitive
● Nesting
○ <em><strong>Hello</strong></em>
○ Strong is applied first and then em is applied. (Bold + italics)
○ Hello
● Invalid
○ <em><strong>Hello</em></strong>
○ <em><strong>Hello</em>
○ <em><strong>Hell<o/em></strong>
● Presentation vs Semantics
○ <strong>Hello<strong>
○ <b>Hello</b>
○ Both do the same thing: Hello
○ Both are correct, very difficult to choose which is better
○ Use it so that the semantics makes sense
● Timelines
○ SGML based
■ 1989 - HTML original
■ 1995 - HTML 2
■ 1997 - HTML 3,4
○ XML based
■ XHTML - 1997 - mid 2010s
○ HTML5
■ first release 2008
■ W3C recommendation - 2014
● HTML5
○ Block elements: <div>
○ Inline elements: <span>
○ Logical elements: <nav>, <footer>
○ Media: <audio>, <video>
Remove “presentation only” tags:
● <center>
● <font>

● Document Object Model:

<html>
<head>
<title>My title</title>
</head>
<body>
<h1>A heading</h1>
<a href="link">Link Text</a>
</body>
</html>
● DOM
○ Tree structure representing logical layout of document
○ Direct manipulation of tree is possible
○ Application Programming Interfaces (APIs)
■ Canvas
■ Offline apps
■ Web storage
■ Drag and Drop
■ …
○ Javascript primary means of manipulating
○ CSS used for styling

LEC 5: Introduction to Styling


● Markup vs Style

<h1>Hello</h1>

● Separation of Styling
○ Style hints in separate blocks
■ Separate files
○ Themes
○ Style sheets
■ Specify presentation information
○ Cascading Style Sheets (CSS)
■ Allow multiple definitions
■ Latest takes precedence

LEC 6: Types of CSS styling and Responsive Websites


● Inline CSS
○ Directly add style to the tag
○ Example:
<h1 style="color:blue;text-align:center;">A heading</h1>
● Internal CSS
○ Embed inside <head> tag
○ All <h1> tags in the document will look the same - centrally modified

<style>
body {
background-color: linen;
}

h1 {
color: maroon;
margin-left: 40px;
}
</style>

● External CSS
○ Extract common content for reuse
○ Multiple CSS files can be included
○ Latest definition of style takes precedence
● Responsive Design
○ Mobile and Tablets have smaller screens
■ Different form factors
○ Adapt to screen - Respond
○ CSS control styling - HTML controls content
● Bootstrap
○ Commonly used framework
■ Originated from Twitter
■ Widely used now
○ Standard styles for various components
■ Buttons
■ Forms
■ Icons
○ Mobile first: highly responsive layout
● Javascript?
○ Interpreted language brought into the browser
○ Not really related to Java in any way - formally ECMAScript
○ Why?
■ HTML is not a programming language
■ CSS is not meant as a programming language
○ Would still like to have “programmability” inside browser
○ Not part of the core presentation requirements
■ Very useful

You might also like